Open
Bug 1286318
Opened 9 years ago
Updated 3 months ago
race condition caused by automatic data_version setting in balrog api client
Categories
(Release Engineering :: Release Automation, defect)
Release Engineering
Release Automation
Tracking
(Not tracked)
NEW
People
(Reporter: bhearsum, Unassigned)
Details
tl;dr - automatic data_version setting in https://github.com/mozilla/build-tools/blob/master/lib/python/balrog/submitter/api.py has a race condition that can happen when two clients try to submit a new release at the same time.
I think the right fix here is to remove the magic data_version setting from api.py and force callers to deal with it. Code as low level as that is unlikely to have the necessary information to know whether or not the data_version it receives from the server is correct or not.
More details:
We had two Funsize tasks doing submission around the same time:
1)
Client log:
2016-07-11 19:45:08,957 - DEBUG - "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0
2016-07-11 19:45:08,979 - DEBUG - "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0
2016-07-11 19:45:08,997 - DEBUG - "HEAD /api/csrf_token HTTP/1.1" 200 0
2016-07-11 19:45:08,998 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US
2016-07-11 19:45:08,998 - DEBUG - Data sent: {'data': '{"buildID": "20160711002726", "appVersion": "48.0", "displayVersion": "48.0 Beta 7",
"partials": [{"hashValue": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b6
28c29f3a640c3025c6", "from": "Firefox-48.0b5-build1", "filesize": 11388739}], "platformVersion": "48.0", "completes": [{"hashValue": "08bc5
bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", "from": "*",
"filesize": 53729269}]}', 'product': u'Firefox', 'hashFunction': 'sha512', 'schema_version': '4'}
2016-07-11 19:45:09,064 - DEBUG - "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.1" 201 23
2016-07-11 19:45:09,065 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US", "t
imestamp": 1468266309.065282, "method": "PUT", "elapsed_secs": 0.06684613227844238, "status_code": 201}
Server log:
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 404 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 404 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "HEAD /api/csrf_token HTTP/1.0" 200 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.0" 201 23 "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
2)
Client log:
2016-07-11 19:45:09,044 - DEBUG - "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0
2016-07-11 19:45:09,071 - DEBUG - "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 200 0
2016-07-11 19:45:09,074 - DEBUG - Data sent: {'data_version': '2', 'product': u'Firefox', 'hashFunction': 'sha512', 'data': '{"buildID": "2
0160711002726", "appVersion": "48.0", "displayVersion": "48.0 Beta 7", "partials": [{"hashValue": "5c02802b876c825c736394c40af962bfbf780c46
4efded64f758fe57fcc7ebbb83ae8898ff26f473ba770ffb4950c60b6c7f73128e20a3b6f63dd7a83865c573", "from": "Firefox-48.0b6-build1", "filesize": 111
39337}], "platformVersion": "48.0", "completes": [{"hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef8
1942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", "from": "*", "filesize": 53729269}]}', 'schema_version': '4'}
2016-07-11 19:45:09,129 - DEBUG - "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.1" 200 23
Server log:
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 404 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 200 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.0" 200 23 "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
We ended up with this sequence of blobs:
data_version 1:
{
"hashFunction": "sha512",
"name": "Firefox-48.0b7-build1",
"schema_version": 4
}
data_version 2 (contains the complete + partial from b5):
{
"hashFunction": "sha512",
"name": "Firefox-48.0b7-build1",
"platforms": {
"WINNT_x86-msvc": {
"locales": {
"en-US": {
"appVersion": "48.0",
"buildID": "20160711002726",
"completes": [
{
"filesize": 53729269,
"from": "*",
"hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2"
}
],
"displayVersion": "48.0 Beta 7",
"partials": [
{
"filesize": 11388739,
"from": "Firefox-48.0b5-build1",
"hashValue": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b628c29f3a640c3025c6"
}
],
"platformVersion": "48.0"
}
}
}
},
"schema_version": 4
}
data_version 3 (contains the complete and the partial from b6):
{
"hashFunction": "sha512",
"name": "Firefox-48.0b7-build1",
"platforms": {
"WINNT_x86-msvc": {
"locales": {
"en-US": {
"appVersion": "48.0",
"buildID": "20160711002726",
"completes": [
{
"filesize": 53729269,
"from": "*",
"hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2"
}
],
"displayVersion": "48.0 Beta 7",
"partials": [
{
"filesize": 11139337,
"from": "Firefox-48.0b6-build1",
"hashValue": "5c02802b876c825c736394c40af962bfbf780c464efded64f758fe57fcc7ebbb83ae8898ff26f473ba770ffb4950c60b6c7f73128e20a3b6f63dd7a83865c573"
}
],
"platformVersion": "48.0"
}
}
}
},
"schema_version": 4
}
So, comment #1 and on actually describe a slightly different bug than this one. At no point do any of these requests raise an OutdatedDataError, so there's no failure to rollback. What's happened in this case is:
* Both clients made an initial GET to /api/releases/Firefox-48.0b7-build1 to get the current state of the object and received a 404 (initiated at https://github.com/mozilla/build-tools/blob/8be272b13e757d61f6e9490b300509346c8146cd/lib/python/balrog/submitter/cli.py#L296)
* Both then tried to submit their own version of the blob (initiated at https://github.com/mozilla/build-tools/blob/8be272b13e757d61f6e9490b300509346c8146cd/lib/python/balrog/submitter/cli.py#L308)
* Deep in the API code, we have helpers that do a prequest to get and set a csrf_token and potentially the data_version. Because the release doesn't exist yet, the data data_version passed to the submitter is set to None, which makes https://github.com/mozilla/build-tools/blob/master/lib/python/balrog/submitter/api.py#L79 automatically set it to the current version. Client #1 got a 404 when requesting the current version, so it didn't set it. Client #2 got a 200, so it set to data_version 2. Because client #2 had client #1's data_version, but not data, it ended up overwriting it.
Comment hidden (Intermittent Failures Robot) |
Comment 2•9 years ago
|
||
What would it take to eliminate this footgun? Difficult, timeline, etc? (I don't quite understand all the implications of changing this code, but want to see footguns gone)
Flags: needinfo?(rail)
Flags: needinfo?(bhearsum)
Comment 3•9 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #2)
> What would it take to eliminate this footgun? Difficult, timeline, etc? (I
> don't quite understand all the implications of changing this code, but want
> to see footguns gone)
..I might have meant Bug 1246993, unsure.
Comment 4•9 years ago
|
||
Maybe change https://github.com/mozilla/build-tools/blob/8be272b13e757d61f6e9490b300509346c8146cd/lib/python/balrog/submitter/cli.py#L103 to avoid using None and use 1 or 0 instead?
Flags: needinfo?(rail)
Reporter | ||
Comment 5•9 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #2)
> What would it take to eliminate this footgun? Difficult, timeline, etc? (I
> don't quite understand all the implications of changing this code, but want
> to see footguns gone)
The fix is:
(In reply to Ben Hearsum (:bhearsum) from comment #0)
> I think the right fix here is to remove the magic data_version setting from
> api.py and force callers to deal with it. Code as low level as that is
> unlikely to have the necessary information to know whether or not the
> data_version it receives from the server is correct or not.
...and deal with whatever (if any) changes to cli.py are necessary to cope with this.
Flags: needinfo?(bhearsum)
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
Updated•7 years ago
|
Component: General → Release Automation: Updates
QA Contact: catlee → mtabara
Updated•2 years ago
|
Severity: normal → S3
Updated•3 months ago
|
QA Contact: mtabara
Updated•3 months ago
|
Component: Release Automation: Updates → Release Automation
You need to log in
before you can comment on or make changes to this bug.
Description
•