Open Bug 1286318 Opened 9 years ago Updated 3 months ago

race condition caused by automatic data_version setting in balrog api client

Categories

(Release Engineering :: Release Automation, defect)

defect

Tracking

(Not tracked)

People

(Reporter: bhearsum, Unassigned)

Details

tl;dr - automatic data_version setting in https://github.com/mozilla/build-tools/blob/master/lib/python/balrog/submitter/api.py has a race condition that can happen when two clients try to submit a new release at the same time. I think the right fix here is to remove the magic data_version setting from api.py and force callers to deal with it. Code as low level as that is unlikely to have the necessary information to know whether or not the data_version it receives from the server is correct or not. More details: We had two Funsize tasks doing submission around the same time: 1) Client log: 2016-07-11 19:45:08,957 - DEBUG - "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0 2016-07-11 19:45:08,979 - DEBUG - "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0 2016-07-11 19:45:08,997 - DEBUG - "HEAD /api/csrf_token HTTP/1.1" 200 0 2016-07-11 19:45:08,998 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US 2016-07-11 19:45:08,998 - DEBUG - Data sent: {'data': '{"buildID": "20160711002726", "appVersion": "48.0", "displayVersion": "48.0 Beta 7", "partials": [{"hashValue": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b6 28c29f3a640c3025c6", "from": "Firefox-48.0b5-build1", "filesize": 11388739}], "platformVersion": "48.0", "completes": [{"hashValue": "08bc5 bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", "from": "*", "filesize": 53729269}]}', 'product': u'Firefox', 'hashFunction': 'sha512', 'schema_version': '4'} 2016-07-11 19:45:09,064 - DEBUG - "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.1" 201 23 2016-07-11 19:45:09,065 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US", "t imestamp": 1468266309.065282, "method": "PUT", "elapsed_secs": 0.06684613227844238, "status_code": 201} Server log: 10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 404 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic" 10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 404 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic" 10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "HEAD /api/csrf_token HTTP/1.0" 200 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic" 10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.0" 201 23 "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic" 2) Client log: 2016-07-11 19:45:09,044 - DEBUG - "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0 2016-07-11 19:45:09,071 - DEBUG - "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 200 0 2016-07-11 19:45:09,074 - DEBUG - Data sent: {'data_version': '2', 'product': u'Firefox', 'hashFunction': 'sha512', 'data': '{"buildID": "2 0160711002726", "appVersion": "48.0", "displayVersion": "48.0 Beta 7", "partials": [{"hashValue": "5c02802b876c825c736394c40af962bfbf780c46 4efded64f758fe57fcc7ebbb83ae8898ff26f473ba770ffb4950c60b6c7f73128e20a3b6f63dd7a83865c573", "from": "Firefox-48.0b6-build1", "filesize": 111 39337}], "platformVersion": "48.0", "completes": [{"hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef8 1942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", "from": "*", "filesize": 53729269}]}', 'schema_version': '4'} 2016-07-11 19:45:09,129 - DEBUG - "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.1" 200 23 Server log: 10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 404 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic" 10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 200 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic" 10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.0" 200 23 "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic" We ended up with this sequence of blobs: data_version 1: { "hashFunction": "sha512", "name": "Firefox-48.0b7-build1", "schema_version": 4 } data_version 2 (contains the complete + partial from b5): { "hashFunction": "sha512", "name": "Firefox-48.0b7-build1", "platforms": { "WINNT_x86-msvc": { "locales": { "en-US": { "appVersion": "48.0", "buildID": "20160711002726", "completes": [ { "filesize": 53729269, "from": "*", "hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2" } ], "displayVersion": "48.0 Beta 7", "partials": [ { "filesize": 11388739, "from": "Firefox-48.0b5-build1", "hashValue": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b628c29f3a640c3025c6" } ], "platformVersion": "48.0" } } } }, "schema_version": 4 } data_version 3 (contains the complete and the partial from b6): { "hashFunction": "sha512", "name": "Firefox-48.0b7-build1", "platforms": { "WINNT_x86-msvc": { "locales": { "en-US": { "appVersion": "48.0", "buildID": "20160711002726", "completes": [ { "filesize": 53729269, "from": "*", "hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2" } ], "displayVersion": "48.0 Beta 7", "partials": [ { "filesize": 11139337, "from": "Firefox-48.0b6-build1", "hashValue": "5c02802b876c825c736394c40af962bfbf780c464efded64f758fe57fcc7ebbb83ae8898ff26f473ba770ffb4950c60b6c7f73128e20a3b6f63dd7a83865c573" } ], "platformVersion": "48.0" } } } }, "schema_version": 4 } So, comment #1 and on actually describe a slightly different bug than this one. At no point do any of these requests raise an OutdatedDataError, so there's no failure to rollback. What's happened in this case is: * Both clients made an initial GET to /api/releases/Firefox-48.0b7-build1 to get the current state of the object and received a 404 (initiated at https://github.com/mozilla/build-tools/blob/8be272b13e757d61f6e9490b300509346c8146cd/lib/python/balrog/submitter/cli.py#L296) * Both then tried to submit their own version of the blob (initiated at https://github.com/mozilla/build-tools/blob/8be272b13e757d61f6e9490b300509346c8146cd/lib/python/balrog/submitter/cli.py#L308) * Deep in the API code, we have helpers that do a prequest to get and set a csrf_token and potentially the data_version. Because the release doesn't exist yet, the data data_version passed to the submitter is set to None, which makes https://github.com/mozilla/build-tools/blob/master/lib/python/balrog/submitter/api.py#L79 automatically set it to the current version. Client #1 got a 404 when requesting the current version, so it didn't set it. Client #2 got a 200, so it set to data_version 2. Because client #2 had client #1's data_version, but not data, it ended up overwriting it.
What would it take to eliminate this footgun? Difficult, timeline, etc? (I don't quite understand all the implications of changing this code, but want to see footguns gone)
Flags: needinfo?(rail)
Flags: needinfo?(bhearsum)
(In reply to Justin Wood (:Callek) from comment #2) > What would it take to eliminate this footgun? Difficult, timeline, etc? (I > don't quite understand all the implications of changing this code, but want > to see footguns gone) ..I might have meant Bug 1246993, unsure.
(In reply to Justin Wood (:Callek) from comment #2) > What would it take to eliminate this footgun? Difficult, timeline, etc? (I > don't quite understand all the implications of changing this code, but want > to see footguns gone) The fix is: (In reply to Ben Hearsum (:bhearsum) from comment #0) > I think the right fix here is to remove the magic data_version setting from > api.py and force callers to deal with it. Code as low level as that is > unlikely to have the necessary information to know whether or not the > data_version it receives from the server is correct or not. ...and deal with whatever (if any) changes to cli.py are necessary to cope with this.
Flags: needinfo?(bhearsum)
Component: General Automation → General
Component: General → Release Automation: Updates
QA Contact: catlee → mtabara
Severity: normal → S3
QA Contact: mtabara
Component: Release Automation: Updates → Release Automation
You need to log in before you can comment on or make changes to this bug.