Closed
Bug 1306860
Opened 8 years ago
Closed 7 years ago
Intermittent-infra l10n nightly command timed out: 10800 seconds elapsed running ['c:/mozilla-build/python27/python', '-u', 'scripts/scripts/desktop_l10n.py', '--environment-config', 'single_locale/production.py', '--branch-config', 'single_locale/mozilla
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: bulk-close-intermittents, intermittent-failure)
Attachments
(1 file)
1.02 KB,
patch
|
rail
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
Filed by: philringnalda [at] gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=3711227&repo=mozilla-aurora https://archive.mozilla.org/pub/firefox/nightly/2016/10/2016-10-01-00-40-00-mozilla-aurora-l10n/mozilla-aurora-win32-l10n-nightly-2-unknown-bm70-build1-build7.txt.gz
Comment 1•8 years ago
|
||
Ok, so we have two problems here: (1) - b-2008 spots don't actually kill buildbot when they time out. Theres a bug on file, and its reasonably important, n-i to markco to comment here what bug that is. (2) - We have a *lot* of retries to balrog in this log... That probably contributes a LOT to the potential timeout here... (some of the log excerpt follows) Ben is there any insight into how we can shore up issue #2 here a bit (either on client side, or server side, to avoid the excess time-spent) 06:17:11 INFO - "GET /api/releases/Firefox-mozilla-aurora-nightly-latest/builds/WINNT_x86-msvc/fa HTTP/1.1" 200 1838 06:17:11 INFO - REQUEST STATS: {"url": "https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-latest/builds/WINNT_x86-msvc/fa", "timestamp": 1475327831.498, "method": "GET", "elapsed_secs": 1.184999942779541, "status_code": 200} 06:17:11 INFO - Balrog request to https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-20161001004000 06:17:11 INFO - Data sent: None 06:17:11 INFO - Starting new HTTPS connection (1): aus4-admin.mozilla.org 06:17:13 INFO - "GET /api/releases/Firefox-mozilla-aurora-nightly-20161001004000 HTTP/1.1" 200 658908 06:17:14 INFO - REQUEST STATS: {"url": "https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-20161001004000", "timestamp": 1475327834.162, "method": "GET", "elapsed_secs": 2.6610000133514404, "status_code": 200} 06:17:14 INFO - Balrog request to https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-20161001004000/builds/WINNT_x86-msvc/fa 06:17:14 INFO - Data sent: None 06:17:15 INFO - "GET /api/releases/Firefox-mozilla-aurora-nightly-20161001004000/builds/WINNT_x86-msvc/fa HTTP/1.1" 200 454 06:17:15 INFO - REQUEST STATS: {"url": "https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-20161001004000/builds/WINNT_x86-msvc/fa", "timestamp": 1475327835.082, "method": "GET", "elapsed_secs": 0.9119999408721924, "status_code": 200} 06:17:15 INFO - Balrog request to https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-latest 06:17:15 INFO - Data sent: None 06:17:15 INFO - "HEAD /api/releases/Firefox-mozilla-aurora-nightly-latest HTTP/1.1" 200 0 06:17:15 INFO - REQUEST STATS: {"url": "https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-latest", "timestamp": 1475327835.842, "method": "HEAD", "elapsed_secs": 0.7579998970031738, "status_code": 200} 06:17:15 INFO - Balrog request to https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-latest/builds/WINNT_x86-msvc/fa 06:17:15 INFO - Data sent: {'alias': '["WINNT_x86-msvc-x86", "WINNT_x86-msvc-x64"]', 'product': u'Firefox', 'hashFunction': u'sha512', 'data_version': '334039', 'data': '{"buildID": "20161001004000", "platformVersion": "51.0a2", "displayVersion": "51.0a2", "completes": [{"fileUrl": "https://archive.mozilla.org/pub/firefox/nightly/2016/10/2016-10-01-00-40-00-mozilla-aurora-l10n/firefox-51.0a2.fa.win32.complete.mar", "from": "*", "hashValue": "f84541c771e99c7f6c760b61b153644967b4e8a9b4ca26e1f2cb3cac0faa4eb60f8951417685ce276d7bfeb3b9c7c91dba860d85a71e3adb5a1606370d918101", "filesize": 54580969}], "appVersion": "51.0a2"}', 'schema_version': 4} 06:17:17 INFO - "PUT /api/releases/Firefox-mozilla-aurora-nightly-latest/builds/WINNT_x86-msvc/fa HTTP/1.1" 400 87 06:17:17 INFO - Caught HTTPError: {"data": ["Failed to update row, old_data_version doesn't match current data_version"]} 06:17:17 INFO - REQUEST STATS: {"url": "https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-latest/builds/WINNT_x86-msvc/fa", "timestamp": 1475327837.714, "method": "PUT", "elapsed_secs": 1.871000051498413, "status_code": 400} 06:17:17 INFO - retry: Caught exception: 06:17:17 INFO - Traceback (most recent call last): 06:17:17 INFO - File "c:\builds\moz2_slave\m-aurora-w32-l10n-ntly-2-00000\build\tools\lib\python\vendor\redo-1.4.1\redo\__init__.py", line 152, in retry 06:17:17 INFO - return action(*args, **kwargs) 06:17:17 INFO - File "c:\builds\moz2_slave\m-aurora-w32-l10n-ntly-2-00000\build\tools\scripts\updates\../../lib/python\balrog\submitter\cli.py", line 338, in update_latest 06:17:17 INFO - data_version=latest_data_version) 06:17:17 INFO - File "c:\builds\moz2_slave\m-aurora-w32-l10n-ntly-2-00000\build\tools\scripts\updates\../../lib/python\balrog\submitter\api.py", line 216, in update_build 06:17:17 INFO - return self.request(method='PUT', data=data) 06:17:17 INFO - File "c:\builds\moz2_slave\m-aurora-w32-l10n-ntly-2-00000\build\tools\scripts\updates\../../lib/python\balrog\submitter\api.py", line 103, in request 06:17:17 INFO - return self.do_request(url, data, method) 06:17:17 INFO - File "c:\builds\moz2_slave\m-aurora-w32-l10n-ntly-2-00000\build\tools\scripts\updates\../../lib/python\balrog\submitter\api.py", line 121, in do_request 06:17:17 INFO - req.raise_for_status() 06:17:17 INFO - File "c:\builds\moz2_slave\m-aurora-w32-l10n-ntly-2-00000\build\tools\scripts\updates\../../lib/python/vendor/requests-2.7.0\requests\models.py", line 851, in raise_for_status 06:17:17 INFO - raise HTTPError(http_error_msg, response=self) 06:17:17 INFO - HTTPError: 400 Client Error: BAD REQUEST 06:17:17 INFO - sleeping for 2.00s (attempt 2/30) 06:17:19 INFO - attempt 3/30 06:17:19 INFO - retry: Calling <function update_latest at 0x02C3E1F0> with args: (), kwargs: {}, attempt #3 06:17:19 INFO - Balrog request to https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-latest 06:17:19 INFO - Data sent: None 06:17:19 INFO - Starting new HTTPS connection (1): aus4-admin.mozilla.org 06:17:20 INFO - "GET /api/releases/Firefox-mozilla-aurora-nightly-latest HTTP/1.1" 200 968131 command timed out: 10800 seconds elapsed running ['c:/mozilla-build/python27/python', '-u', 'scripts/scripts/desktop_l10n.py', '--environment-config', 'single_locale/production.py', '--branch-config', 'single_locale/mozilla-aurora.py', '--platform-config', 'single_locale/win32.py', '--total-chunks', '6', '--this-chunk', '2', '--balrog-config', 'balrog/production.py'], attempting to kill 06:17:21 INFO - REQUEST STATS: {"url": "https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-latest", "timestamp": 1475327841.646, "method": "GET", "elapsed_secs": 1.930999994277954, "status_code": 200} 06:17:21 INFO - Balrog request to https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-latest/builds/WINNT_x86-msvc/fa 06:17:21 INFO - Data sent: None 06:17:22 INFO - "GET /api/releases/Firefox-mozilla-aurora-nightly-latest/builds/WINNT_x86-msvc/fa HTTP/1.1" 200 1838 06:17:22 INFO - REQUEST STATS: {"url": "https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-latest/builds/WINNT_x86-msvc/fa", "timestamp": 1475327842.469, "method": "GET", "elapsed_secs": 0.8110001087188721, "status_code": 200} 06:17:22 INFO - Balrog request to https://aus4-admin.mozilla.org/api/releases/Firefox-mozilla-aurora-nightly-20161001004000 06:17:22 INFO - Data sent: None
Flags: needinfo?(mcornmesser)
Flags: needinfo?(bhearsum)
Comment 2•8 years ago
|
||
There's probably more retries because we recently reduced sleep time and increased attempts when submitting to Balrog. The fix for this is balrogworker, and we've been trying to marginally improve the situation in https://bugzilla.mozilla.org/show_bug.cgi?id=1284516 in the meantime.
Flags: needinfo?(bhearsum)
Comment 3•8 years ago
|
||
One thing that might help here is to reduce the # of retries. In https://github.com/mozilla/build-tools/commit/baaf2fb71a1deedc2206413d5eeafee29ae11272, we switched from 5 attempts w ~30s sleep between them -> 30 attempts with 2s sleep. That ended up increasing the theoretical max sleep time, which may have pushed us over here. Let's try 10 attempts max, which is still twice the number we had before.
Attachment #8797175 -
Flags: review?(rail)
Comment 4•8 years ago
|
||
My thoughts from IRC to immortalize in this bug... 11:13 AM <Callek> bhearsum: sooo I'm thinking the likely "before" case was "something got a data version mismatch, and waits 30sec, <other job> doesn't and manages to churn through a bunch of submissions in short succession --" thus meaning that the 2sec sleep is actually hurting us when coupled with tcp/web-app overhead 11:14 AM <Callek> so I'm not sure dropping to 10 retries will solve more than causing them to fail for failing to submit, rather than fail for timeout 11:14 AM <Callek> (mostly said based on my own informed intuition, rather than on hard data) 11:17 AM <•bhearsum> you could be onto something there 11:17 AM <•bhearsum> honestly, i'm just throwing things at the wall 11:18 AM <•bhearsum> it's so difficult to reason about what's going to happen with so many variables in the mix 11:18 AM <•bhearsum> if you want to try that feel free to toss up a patch, otherwise we'll see what happens with mine first 11:18 AM <Callek> I'm happy to see how well yours takes and go from there
Updated•8 years ago
|
Attachment #8797175 -
Flags: review?(rail) → review+
Updated•8 years ago
|
Attachment #8797175 -
Flags: checked-in+
Comment 6•8 years ago
|
||
Done in remote: https://hg.mozilla.org/build/puppet/rev/4a788524c183c5a3f61ea2dc72f5018870afed02 remote: https://hg.mozilla.org/build/puppet/rev/618f5d39bc222dd1e1f8c7c90e96b74dab1117a2
Flags: needinfo?(rail)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Status: NEW → RESOLVED
Closed: 7 years ago
Keywords: bulk-close-intermittents
Resolution: --- → INCOMPLETE
Assignee | ||
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•