Closed Bug 1246993 Opened 8 years ago Closed 7 years ago

releases api (and maybe others) doesn't properly rollback transactions in all failure cases

Categories

(Release Engineering Graveyard :: Applications: Balrog (backend), defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

(Whiteboard: [lang=python][ready])

Attachments

(1 file)

I discovered this while doing some testing of bug 741412. The problem boils down to this: for a lot of types of errors, we *return* 400s from views instead of raising exceptions. These views are wrapped by the AdminView base class (https://github.com/mozilla/balrog/blob/master/auslib/admin/views/base.py#L53) which creates an AUSTransaction in a context manager and passes it along. By returning Responses instead of raising exceptions, .commit() is called when the context manager exits instead of .rollback(). In most cases this doesn't cause an issue (because 400s are raised early, before any database changes are made), but any case where the database is changed, and then a 400 is returned, that change is never rolled back.

This appears to be any issue for any endpoint that uses the big changeRelease helper in a case where the release doesn't already exist. That helper creates the release: https://github.com/mozilla/balrog/blob/fc4862d228155cf3c89dfaf7377f6b03df2c50dd/auslib/admin/views/releases.py#L135, but then could raise 400s later, eg at: https://github.com/mozilla/balrog/blob/fc4862d228155cf3c89dfaf7377f6b03df2c50dd/auslib/admin/views/releases.py#L168

Raising and handling a custom exception seems to be a common way to deal with these cases: http://flask.pocoo.org/docs/0.10/patterns/apierrors/, I'm not certain how well this will work with our custom wrapper though. I tried a few obvious things, and couldn't find a quick solution that wasn't horribly hacky.
We probably hit this in Firefox 48.0b7:

https://tools.taskcluster.net/task-inspector/#HSjj109mSgy0H-6eDung7w/ run 0 tried to submit update data to balrog, but the data didn't show up, even though the server returned 201 (initial submission, no data_version set).

Final verify failed to find a partial update in https://aus5.mozilla.org/update/3/Firefox/48.0/20160630123429/WINNT_x86-msvc-x86/en-US/beta-localtest/default/default/default/update.xml?force=1

After some investigation we found that the release blob doesn't contain the corresponding entry:

jq '.["platforms"]["WINNT_x86-msvc"]["locales"]["en-US"]["partials"]' Firefox-48.0b7-build1.json
[
  {
    "hashValue": "5c02802b876c825c736394c40af962bfbf780c464efded64f758fe57fcc7ebbb83ae8898ff26f473ba770ffb4950c60b6c7f73128e20a3b6f63dd7a83865c573",
    "from": "Firefox-48.0b6-build1",
    "filesize": 11139337
  },
  {
    "hashValue": "aa631da92a3a0d0335b419fb76944545d04aadceb41e1724a69fbdaeaa97b9b9724d7a8642419b736a658c282bf7f43e8fd352b0e273a93b8e7ade52f7760e48",
    "from": "Firefox-48.0b4-build1",
    "filesize": 11881264
  }
]


To fix the issue we rerun the same task and it worked fine.

Some logs below
Final verify error:

Mon Jul 11 22:39:20 UTC 2016:  FAILURE 1: Patch type 'partial' not present in the downloaded update.xml file.
Mon Jul 11 22:39:20 UTC 2016:  
Mon Jul 11 22:39:20 UTC 2016:      Update xml file downloaded from: https://aus5.mozilla.org/update/3/Firefox/48.0/20160630123429/WINNT_x86-msvc-x86/en-US/beta-cdntest/default/default/default/update.xml?force=1
Mon Jul 11 22:39:20 UTC 2016:      Curl returned exit code: 0 (success)
Mon Jul 11 22:39:20 UTC 2016:  
Mon Jul 11 22:39:20 UTC 2016:      The HTTP headers were:
Mon Jul 11 22:39:20 UTC 2016:          HTTP/1.1 200 OK
Mon Jul 11 22:39:20 UTC 2016:          Content-Type: text/xml; charset=utf-8
Mon Jul 11 22:39:20 UTC 2016:          Date: Mon, 11 Jul 2016 22:36:50 GMT
Mon Jul 11 22:39:20 UTC 2016:          X-Proxy-Cache-Status: MISS
Mon Jul 11 22:39:20 UTC 2016:          Content-Length: 564
Mon Jul 11 22:39:20 UTC 2016:          Connection: keep-alive
Mon Jul 11 22:39:20 UTC 2016:          
Mon Jul 11 22:39:20 UTC 2016:  
Mon Jul 11 22:39:20 UTC 2016:      The full curl debug output was:
Mon Jul 11 22:39:20 UTC 2016:          * Hostname was NOT found in DNS cache
Mon Jul 11 22:39:20 UTC 2016:          *   Trying 54.187.134.130...
Mon Jul 11 22:39:20 UTC 2016:          * Connected to aus5.mozilla.org (54.187.134.130) port 443 (#0)
Mon Jul 11 22:39:20 UTC 2016:          * successfully set certificate verify locations:
Mon Jul 11 22:39:20 UTC 2016:          *   CAfile: none
Mon Jul 11 22:39:20 UTC 2016:            CApath: /etc/ssl/certs
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS handshake, Client hello (1):
Mon Jul 11 22:39:20 UTC 2016:          } [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS handshake, Server hello (2):
Mon Jul 11 22:39:20 UTC 2016:          { [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS handshake, CERT (11):
Mon Jul 11 22:39:20 UTC 2016:          { [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS handshake, Server key exchange (12):
Mon Jul 11 22:39:20 UTC 2016:          { [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS handshake, Server finished (14):
Mon Jul 11 22:39:20 UTC 2016:          { [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS handshake, Client key exchange (16):
Mon Jul 11 22:39:20 UTC 2016:          } [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS change cipher, Client hello (1):
Mon Jul 11 22:39:20 UTC 2016:          } [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS handshake, Finished (20):
Mon Jul 11 22:39:20 UTC 2016:          } [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS change cipher, Client hello (1):
Mon Jul 11 22:39:20 UTC 2016:          { [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSLv3, TLS handshake, Finished (20):
Mon Jul 11 22:39:20 UTC 2016:          { [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
Mon Jul 11 22:39:20 UTC 2016:          * Server certificate:
Mon Jul 11 22:39:20 UTC 2016:          * 	 subject: C=US; ST=California; L=Mountain View; O=Mozilla Foundation; CN=aus5.mozilla.org
Mon Jul 11 22:39:20 UTC 2016:          * 	 start date: 2015-07-24 00:00:00 GMT
Mon Jul 11 22:39:20 UTC 2016:          * 	 expire date: 2017-07-28 12:00:00 GMT
Mon Jul 11 22:39:20 UTC 2016:          * 	 issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
Mon Jul 11 22:39:20 UTC 2016:          * 	 SSL certificate verify ok.
Mon Jul 11 22:39:20 UTC 2016:          > GET /update/3/Firefox/48.0/20160630123429/WINNT_x86-msvc-x86/en-US/beta-cdntest/default/default/default/update.xml?force=1 HTTP/1.1
Mon Jul 11 22:39:20 UTC 2016:          > User-Agent: curl/7.38.0
Mon Jul 11 22:39:20 UTC 2016:          > Host: aus5.mozilla.org
Mon Jul 11 22:39:20 UTC 2016:          > Accept: */*
Mon Jul 11 22:39:20 UTC 2016:          > 
Mon Jul 11 22:39:20 UTC 2016:          < HTTP/1.1 200 OK
Mon Jul 11 22:39:20 UTC 2016:          < Content-Type: text/xml; charset=utf-8
Mon Jul 11 22:39:20 UTC 2016:          < Date: Mon, 11 Jul 2016 22:36:50 GMT
Mon Jul 11 22:39:20 UTC 2016:          < X-Proxy-Cache-Status: MISS
Mon Jul 11 22:39:20 UTC 2016:          < Content-Length: 564
Mon Jul 11 22:39:20 UTC 2016:          < Connection: keep-alive
Mon Jul 11 22:39:20 UTC 2016:          < 
Mon Jul 11 22:39:20 UTC 2016:          { [data not shown]
Mon Jul 11 22:39:20 UTC 2016:          * Connection #0 to host aus5.mozilla.org left intact
Mon Jul 11 22:39:20 UTC 2016:  
Mon Jul 11 22:39:20 UTC 2016:      The returned update.xml file was:
Mon Jul 11 22:39:20 UTC 2016:          <?xml version="1.0"?>
Mon Jul 11 22:39:20 UTC 2016:          <updates>
Mon Jul 11 22:39:20 UTC 2016:              <update type="minor" displayVersion="48.0 Beta 7" appVersion="48.0" platformVersion="48.0" buildID="20160711002726" detailsURL="https://www.mozilla.org/en-US/firefox/48.0/releasenotes/">
Mon Jul 11 22:39:20 UTC 2016:                  <patch type="complete" URL="http://download.mozilla.org/?product=firefox-48.0b7-complete&amp;os=win&amp;lang=en-US&amp;force=1" hashFunction="sha512" hashValue="08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2" size="53729269"/>
Mon Jul 11 22:39:20 UTC 2016:              </update>
Mon Jul 11 22:39:20 UTC 2016:          </updates>
Mon Jul 11 22:39:20 UTC 2016:  
Mon Jul 11 22:39:20 UTC 2016:      This url and patch type combination was tested due to the following cfg file entries:
Mon Jul 11 22:39:20 UTC 2016:          beta-firefox-win32.cfg line 2: release="48.0" product="Firefox" platform="WINNT_x86-msvc-x86" build_id="20160630123429" locales="ach af an ar as ast az be bg bn-BD bn-IN br bs ca cak cs cy da de dsb el en-GB en-US en-ZA eo es-AR es-CL es-ES es-MX et eu fa ff fi fr fy-NL ga-IE gd gl gn gu-IN he hi-IN hr hsb hu hy-AM id is it ja kk km kn ko lij lt lv mai mk ml mr ms nb-NO nl nn-NO or pa-IN pl pt-BR pt-PT rm ro ru si sk sl son sq sr sv-SE ta te th tr uk uz vi xh zh-CN zh-TW" channel="beta-localtest" patch_types="complete partial" from="/firefox/releases/48.0b5/win32/%locale%/Firefox Setup 48.0b5.exe" ftp_server_from="https://archive.mozilla.org/pub" ftp_server_to="https://archive.mozilla.org/pub"
Mon Jul 11 22:39:20 UTC 2016:  
Mon Jul 11 22:39:20 UTC 2016:  
Mon Jul 11 22:39:20 UTC 2016:  ====================================
Mon Jul 11 22:39:20 UTC 2016:  KEY STATS
Mon Jul 11 22:39:20 UTC 2016:  ====================================
Mon Jul 11 22:39:20 UTC 2016:  
Mon Jul 11 22:39:20 UTC 2016:  Config files scanned:                       5
Mon Jul 11 22:39:20 UTC 2016:  Update xml files downloaded and parsed:     24243
Mon Jul 11 22:39:20 UTC 2016:  Unique mar urls found:                      1819
Mon Jul 11 22:39:20 UTC 2016:  Failures:                                   1
Mon Jul 11 22:39:20 UTC 2016:  Parallel processes used (maximum limit):    48
Mon Jul 11 22:39:20 UTC 2016:  Execution time:                             420 seconds
Mon Jul 11 22:39:20 UTC 2016:
Initial submission:


https://public-artifacts.taskcluster.net/HSjj109mSgy0H-6eDung7w/0/public/logs/live_backing.log

+ cat /home/worker/artifacts/manifest.json
[
  {
    "ACCEPTED_MAR_CHANNEL_IDS": "firefox-mozilla-beta", 
    "appName": "Firefox", 
    "branch": "mozilla-beta", 
    "detached_signatures": {
      "gpg": "firefox-48.0b5-48.0b7.en-US.win32.partial.mar.asc"
    }, 
    "from_buildid": "20160630123429", 
    "from_hash": "755a970d7047879af13ea6b7ae9d476f82ea798881c044cde5e7fbee7d7896524f2f2af75a4f0fb5221c848a387366a8add66d8bd0dc76b90f484a058519972a", 
    "from_mar": "http://download.mozilla.org/?product=firefox-48.0b5-complete&os=win&lang=en-US", 
    "from_size": 53510168, 
    "hash": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b628c29f3a640c3025c6", 
    "locale": "en-US", 
    "mar": "firefox-48.0b5-48.0b7.en-US.win32.partial.mar", 
    "platform": "win32", 
    "previousBuildNumber": 1, 
    "previousVersion": "48.0b5", 
    "repo": "https://hg.mozilla.org/releases/mozilla-beta", 
    "revision": "9d734024ed35d74449601cc04917b327e0973c0d", 
    "size": 11388739, 
    "toBuildNumber": 1, 
    "toVersion": "48.0b7", 
    "to_buildid": "20160711002726", 
    "to_hash": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", 
    "to_mar": "https://queue.taskcluster.net/v1/task/P2yEJD3iTY6GxLaVcEe1rg/artifacts/public/build/firefox-48.0.en-US.win32.complete.mar", 
    "to_size": 53729269, 
    "version": "48.0"
  }
]+ python /home/worker/bin/funsize-balrog-submitter.py --artifacts-url-prefix https://queue.taskcluster.net/v1/task/uTa0SBjRTXqUHbesdP40_g/artifacts/public/env --manifest /home/worker/artifacts/manifest.json -a http://balrog/api --signing-cert /home/worker/keys/release.pubkey --verbose
2016-07-11 19:45:08,920 - WARNING - Skipping S3 uploads...
2016-07-11 19:45:08,920 - INFO - Release style balrog submission
2016-07-11 19:45:08,920 - DEBUG - attempt 1/5
2016-07-11 19:45:08,921 - DEBUG - retry: Calling <function <lambda> at 0x7f8c19d12b90> with args: (), kwargs: {}, attempt #1
2016-07-11 19:45:08,922 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1
2016-07-11 19:45:08,922 - DEBUG - Data sent: None
2016-07-11 19:45:08,925 - INFO - Starting new HTTP connection (1): balrog
2016-07-11 19:45:08,957 - DEBUG - "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0
2016-07-11 19:45:08,957 - ERROR - Caught HTTPError: 
2016-07-11 19:45:08,957 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1", "timestamp": 1468266308.957886, "method": "GET", "elapsed_secs": 0.03546881675720215, "status_code": 404}
2016-07-11 19:45:08,958 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1
2016-07-11 19:45:08,958 - DEBUG - Data sent: None
2016-07-11 19:45:08,959 - INFO - Starting new HTTP connection (1): balrog
2016-07-11 19:45:08,979 - DEBUG - "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0
2016-07-11 19:45:08,979 - ERROR - Caught HTTPError: 
2016-07-11 19:45:08,980 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1", "timestamp": 1468266308.98004, "method": "HEAD", "elapsed_secs": 0.02099299430847168, "status_code": 404}
2016-07-11 19:45:08,980 - DEBUG - Balrog request to http://balrog/api/csrf_token
2016-07-11 19:45:08,980 - DEBUG - Data sent: None
2016-07-11 19:45:08,997 - DEBUG - "HEAD /api/csrf_token HTTP/1.1" 200 0
2016-07-11 19:45:08,998 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/csrf_token", "timestamp": 1468266308.998027, "method": "HEAD", "elapsed_secs": 0.01717209815979004, "status_code": 200}
2016-07-11 19:45:08,998 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US
2016-07-11 19:45:08,998 - DEBUG - Data sent: {'data': '{"buildID": "20160711002726", "appVersion": "48.0", "displayVersion": "48.0 Beta 7", "partials": [{"hashValue": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b628c29f3a640c3025c6", "from": "Firefox-48.0b5-build1", "filesize": 11388739}], "platformVersion": "48.0", "completes": [{"hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", "from": "*", "filesize": 53729269}]}', 'product': u'Firefox', 'hashFunction': 'sha512', 'schema_version': '4'}
2016-07-11 19:45:09,064 - DEBUG - "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.1" 201 23
2016-07-11 19:45:09,065 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US", "timestamp": 1468266309.065282, "method": "PUT", "elapsed_secs": 0.06684613227844238, "status_code": 201}
[taskcluster 2016-07-11 19:45:09.288Z] === Task Finished ===
[taskcluster 2016-07-11 19:45:10.306Z] Successful task run with exit code: 0 completed in 187.54 seconds
Rerun https://public-artifacts.taskcluster.net/HSjj109mSgy0H-6eDung7w/1/public/logs/live_backing.log:


+ cat /home/worker/artifacts/manifest.json
[
  {
    "ACCEPTED_MAR_CHANNEL_IDS": "firefox-mozilla-beta", 
    "appName": "Firefox", 
    "branch": "mozilla-beta", 
    "detached_signatures": {
      "gpg": "firefox-48.0b5-48.0b7.en-US.win32.partial.mar.asc"
    }, 
    "from_buildid": "20160630123429", 
    "from_hash": "755a970d7047879af13ea6b7ae9d476f82ea798881c044cde5e7fbee7d7896524f2f2af75a4f0fb5221c848a387366a8add66d8bd0dc76b90f484a058519972a", 
    "from_mar": "http://download.mozilla.org/?product=firefox-48.0b5-complete&os=win&lang=en-US", 
    "from_size": 53510168, 
    "hash": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b628c29f3a640c3025c6", 
    "locale": "en-US", 
    "mar": "firefox-48.0b5-48.0b7.en-US.win32.partial.mar", 
    "platform": "win32", 
    "previousBuildNumber": 1, 
    "previousVersion": "48.0b5", 
    "repo": "https://hg.mozilla.org/releases/mozilla-beta", 
    "revision": "9d734024ed35d74449601cc04917b327e0973c0d", 
    "size": 11388739, 
    "toBuildNumber": 1, 
    "toVersion": "48.0b7", 
    "to_buildid": "20160711002726", 
    "to_hash": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", 
    "to_mar": "https://queue.taskcluster.net/v1/task/P2yEJD3iTY6GxLaVcEe1rg/artifacts/public/build/firefox-48.0.en-US.win32.complete.mar", 
    "to_size": 53729269, 
    "version": "48.0"
  }
]+ python /home/worker/bin/funsize-balrog-submitter.py --artifacts-url-prefix https://queue.taskcluster.net/v1/task/uTa0SBjRTXqUHbesdP40_g/artifacts/public/env --manifest /home/worker/artifacts/manifest.json -a http://balrog/api --signing-cert /home/worker/keys/release.pubkey --verbose
2016-07-12 12:43:11,978 - WARNING - Skipping S3 uploads...
2016-07-12 12:43:11,978 - INFO - Release style balrog submission
2016-07-12 12:43:11,979 - DEBUG - attempt 1/5
2016-07-12 12:43:11,979 - DEBUG - retry: Calling <function <lambda> at 0x7f31c4bf0b90> with args: (), kwargs: {}, attempt #1
2016-07-12 12:43:11,980 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1
2016-07-12 12:43:11,980 - DEBUG - Data sent: None
2016-07-12 12:43:11,983 - INFO - Starting new HTTP connection (1): balrog
2016-07-12 12:43:14,669 - DEBUG - "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 200 427921
2016-07-12 12:43:15,328 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1", "timestamp": 1468327395.328604, "method": "GET", "elapsed_secs": 3.347925901412964, "status_code": 200}
2016-07-12 12:43:15,340 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US
2016-07-12 12:43:15,340 - DEBUG - Data sent: None
2016-07-12 12:43:15,341 - INFO - Starting new HTTP connection (1): balrog
2016-07-12 12:43:18,205 - DEBUG - "GET /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.1" 200 726
2016-07-12 12:43:18,206 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US", "timestamp": 1468327398.206534, "method": "GET", "elapsed_secs": 2.8661460876464844, "status_code": 200}
2016-07-12 12:43:18,208 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1
2016-07-12 12:43:18,208 - DEBUG - Data sent: None
2016-07-12 12:43:20,797 - DEBUG - "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 200 0
2016-07-12 12:43:20,798 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1", "timestamp": 1468327400.798333, "method": "HEAD", "elapsed_secs": 2.589545965194702, "status_code": 200}
2016-07-12 12:43:20,798 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US
2016-07-12 12:43:20,798 - DEBUG - Data sent: {'data_version': '1367', 'product': u'Firefox', 'hashFunction': 'sha512', 'data': '{"platformVersion": "48.0", "partials": [{"hashValue": "5c02802b876c825c736394c40af962bfbf780c464efded64f758fe57fcc7ebbb83ae8898ff26f473ba770ffb4950c60b6c7f73128e20a3b6f63dd7a83865c573", "from": "Firefox-48.0b6-build1", "filesize": 11139337}, {"hashValue": "aa631da92a3a0d0335b419fb76944545d04aadceb41e1724a69fbdaeaa97b9b9724d7a8642419b736a658c282bf7f43e8fd352b0e273a93b8e7ade52f7760e48", "from": "Firefox-48.0b4-build1", "filesize": 11881264}, {"hashValue": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b628c29f3a640c3025c6", "from": "Firefox-48.0b5-build1", "filesize": 11388739}], "buildID": "20160711002726", "displayVersion": "48.0 Beta 7", "appVersion": "48.0", "completes": [{"hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", "from": "*", "filesize": 53729269}]}', 'schema_version': '4'}
2016-07-12 12:43:25,389 - DEBUG - "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.1" 200 26
2016-07-12 12:43:25,390 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US", "timestamp": 1468327405.390428, "method": "PUT", "elapsed_secs": 4.591722011566162, "status_code": 200}
[taskcluster 2016-07-12 12:43:25.555Z] === Task Finished ===
[taskcluster 2016-07-12 12:43:26.521Z] Successful task run with exit code: 0 completed in 163.952 seconds
(In reply to Rail Aliiev [:rail] from comment #5)
> Something tells me that
> http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-beta-noarch/
> release-mozilla-beta-firefox_updates-bm72-build1-build12.txt.gz overwrote
> it....

I don't think it's this one...it has data_version 9...
tl;dr - automatic data_version setting in https://github.com/mozilla/build-tools/blob/master/lib/python/balrog/submitter/api.py has a race condition that can happen when two clients try to submit a new release at the same time. This is slightly different than the case described here, and is actually a bug in the client, not the server. Read on for more details.

We had two Funsize tasks doing submission around the same time:
1)
Client log:
2016-07-11 19:45:08,957 - DEBUG - "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0 
2016-07-11 19:45:08,979 - DEBUG - "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0 
2016-07-11 19:45:08,997 - DEBUG - "HEAD /api/csrf_token HTTP/1.1" 200 0 
2016-07-11 19:45:08,998 - DEBUG - Balrog request to http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US
2016-07-11 19:45:08,998 - DEBUG - Data sent: {'data': '{"buildID": "20160711002726", "appVersion": "48.0", "displayVersion": "48.0 Beta 7",
"partials": [{"hashValue": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b6
28c29f3a640c3025c6", "from": "Firefox-48.0b5-build1", "filesize": 11388739}], "platformVersion": "48.0", "completes": [{"hashValue": "08bc5
bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", "from": "*",
"filesize": 53729269}]}', 'product': u'Firefox', 'hashFunction': 'sha512', 'schema_version': '4'}
2016-07-11 19:45:09,064 - DEBUG - "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.1" 201 23
2016-07-11 19:45:09,065 - DEBUG - REQUEST STATS: {"url": "http://balrog/api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US", "t
imestamp": 1468266309.065282, "method": "PUT", "elapsed_secs": 0.06684613227844238, "status_code": 201}

Server log:
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 404 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 404 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "HEAD /api/csrf_token HTTP/1.0" 200 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.0" 201 23 "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"


2)
Client log:
2016-07-11 19:45:09,044 - DEBUG - "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 404 0 
2016-07-11 19:45:09,071 - DEBUG - "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.1" 200 0 
2016-07-11 19:45:09,074 - DEBUG - Data sent: {'data_version': '2', 'product': u'Firefox', 'hashFunction': 'sha512', 'data': '{"buildID": "2
0160711002726", "appVersion": "48.0", "displayVersion": "48.0 Beta 7", "partials": [{"hashValue": "5c02802b876c825c736394c40af962bfbf780c46
4efded64f758fe57fcc7ebbb83ae8898ff26f473ba770ffb4950c60b6c7f73128e20a3b6f63dd7a83865c573", "from": "Firefox-48.0b6-build1", "filesize": 111
39337}], "platformVersion": "48.0", "completes": [{"hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef8
1942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2", "from": "*", "filesize": 53729269}]}', 'schema_version': '4'}
2016-07-11 19:45:09,129 - DEBUG - "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.1" 200 23 

Server log:
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "GET /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 404 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "HEAD /api/releases/Firefox-48.0b7-build1 HTTP/1.0" 200 - "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"
10.8.75.73 - ffxbld [11/Jul/2016:19:45:08 +0000] "PUT /api/releases/Firefox-48.0b7-build1/builds/WINNT_x86-msvc/en-US HTTP/1.0" 200 23 "-" "python-requests/2.4.3 CPython/2.7.9 Linux/3.13.0-83-generic"


We ended up with this sequence of blobs:
data_version 1:
{
  "hashFunction": "sha512",
  "name": "Firefox-48.0b7-build1",
  "schema_version": 4
}

data_version 2 (contains the complete + partial from b5):
{
  "hashFunction": "sha512",
  "name": "Firefox-48.0b7-build1",
  "platforms": {
    "WINNT_x86-msvc": {
      "locales": {
        "en-US": {
          "appVersion": "48.0",
          "buildID": "20160711002726",
          "completes": [
            {
              "filesize": 53729269,
              "from": "*",
              "hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2"
            }
          ],
          "displayVersion": "48.0 Beta 7",
          "partials": [
            {
              "filesize": 11388739,
              "from": "Firefox-48.0b5-build1",
              "hashValue": "cd2ed3a8bd8f1141c02cc9ef30f07b057544923585cccb67c5d4a91631e47cc2e55a2136a10a982cbf6a0b533ab4de5f1b2d1749a135b628c29f3a640c3025c6"
            }
          ],
          "platformVersion": "48.0"
        }
      }
    }
  },
  "schema_version": 4
}

data_version 3 (contains the complete and the partial from b6):
{
  "hashFunction": "sha512",
  "name": "Firefox-48.0b7-build1",
  "platforms": {
    "WINNT_x86-msvc": {
      "locales": {
        "en-US": {
          "appVersion": "48.0",
          "buildID": "20160711002726",
          "completes": [
            {
              "filesize": 53729269,
              "from": "*",
              "hashValue": "08bc5bd775de65749fe1500478966a04b973b8f4b5761be621a18fe4df38721d9143f11ef81942f3da9e793d8e0a31187e15ad35702ab1b8774b2e149b7588c2"
            }
          ],
          "displayVersion": "48.0 Beta 7",
          "partials": [
            {
              "filesize": 11139337,
              "from": "Firefox-48.0b6-build1",
              "hashValue": "5c02802b876c825c736394c40af962bfbf780c464efded64f758fe57fcc7ebbb83ae8898ff26f473ba770ffb4950c60b6c7f73128e20a3b6f63dd7a83865c573"
            }
          ],
          "platformVersion": "48.0"
        }
      }
    }
  },
  "schema_version": 4
}


So, comment #1 and on actually describe a slightly different bug than this one. At no point do any of these requests raise an OutdatedDataError, so there's no failure to rollback. What's happened in this case is:
* Both clients made an initial GET to /api/releases/Firefox-48.0b7-build1 to get the current state of the object and received a 404 (initiated at https://github.com/mozilla/build-tools/blob/8be272b13e757d61f6e9490b300509346c8146cd/lib/python/balrog/submitter/cli.py#L296)
* Both then tried to submit their own version of the blob (initiated at https://github.com/mozilla/build-tools/blob/8be272b13e757d61f6e9490b300509346c8146cd/lib/python/balrog/submitter/cli.py#L308)
* Deep in the API code, we have helpers that do a prequest to get and set a csrf_token and potentially the data_version. Because the release doesn't exist yet, the data data_version passed to the submitter is set to None, which makes https://github.com/mozilla/build-tools/blob/master/lib/python/balrog/submitter/api.py#L79 automatically set it to the current version. Client #1 got a 404 when requesting the current version, so it didn't set it. Client #2 got a 200, so it set to data_version 2. Because client #2 had client #1's data_version, but not data, it ended up overwriting it.

I think the right fix here is to remove the magic data_version setting from api.py and force callers to deal with it. Code as low level as that is unlikely to have the necessary information to know whether or not the data_version it receives from the server is correct or not.
I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1286318 for the api.py issue, leaving this bug to focus on the original one reported.
Whiteboard: [lang=python][ready]
Priority: -- → P3
While debugging how well the automatic Release merging was working I came across something odd - I found a Release that had multiple change_ids with the same name+data_version in the releases_history table. For example, there are two changes with data_version 227 on http://54.90.211.22:8080/releases/Firefox-58.0b1-build3. I'm nottcertain this is causing it, but it seems like it could be related.

This likely only happens when we have enough requests to update the same Release at the exact same time.
Assignee: nobody → bhearsum
Priority: P3 → P1
Blocks: 1400141
I'll have to take a look at this the second half of next week, apologies for the delay.
Commit pushed to master at https://github.com/mozilla/balrog

https://github.com/mozilla/balrog/commit/447e504e290fb349640db13638c4f3b4129d7744
bug 1246993: admin api doesn't always rollback transactions on failures (#418). r=nthomas,allan-silva
Attachment #8915193 - Flags: review?(nthomas) → review+
Depends on: 1411563
This is in production.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: Release Engineering → Release Engineering Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: