Closed Bug 664539 Opened 13 years ago Closed 13 years ago

update verify should retry if it gets an empty result from AUS

Categories

(Release Engineering :: General, defect, P3)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: catlee)

References

Details

Attachments

(3 files)

We already retry in update verify if we fail to _retrieve_ the snippet, but we don't retry if we get an empty snippet. This can cause spurious burning if AUS loses its mind momentarily.

We might be able to add some arguments to http://hg.mozilla.org/build/tools/file/276961805e0a/release/common/download_mars.sh#l12 to do it. If not, we'll have to do something smarter in that function.
We'll probably get it for free, but we should make sure this happens for final verification, too. We hit a similar issue during 6.0b5:
FAIL: no complete update found for https://aus2.mozilla.org/update/1/Firefox/6.0/20110705195857/Darwin_x86_64-gcc3-u-i386-x86_64/ta/releasetest/update.xml?force=1
FAIL: download_mars returned non-zero exit code: 1
This gives output like this:

Using  https://aus3.mozilla.org/update/1/Firefox/8.0/20111006182035/Linux_x86-gcc3/zu/betatest/update.xml?force=1
Calling <function run_with_timeout at 0xb7c5f02c> with args: (['wget', '--no-check-certificate', '-S', '-O', 'update.xml', 'https://aus3.mozilla.org/update/1/Firefox/8.0/20111006182035/Linux_x86-gcc3/zu/betatest/update.xml?force=1'], 300, None, None, False, True), kwargs: {}, attempt #1
Executing: ['wget', '--no-check-certificate', '-S', '-O', 'update.xml', 'https://aus3.mozilla.org/update/1/Firefox/8.0/20111006182035/Linux_x86-gcc3/zu/betatest/update.xml?force=1']
Process stdio:

Process stderr:
--13:39:11--  https://aus3.mozilla.org/update/1/Firefox/8.0/20111006182035/Linux_x86-gcc3/zu/betatest/update.xml?force=1
Resolving aus3.mozilla.org... 63.245.209.149
Connecting to aus3.mozilla.org|63.245.209.149|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Date: Wed, 12 Oct 2011 21:07:27 GMT
  Server: Apache
  X-Backend-Server: pm-app-dist05
  X-Powered-By: PHP/5.1.6
  Set-Cookie: aus2a=63.245.220.220.1318453647.0357; expires=Wed, 12-Oct-2016 02:11:17 GMT; path=/; domain=aus2.mozilla.org
  Cache-Control: no-store, must-revalidate, post-check=0, pre-check=0, private
  Content-Length: 919
  Keep-Alive: timeout=5, max=199
  Connection: Keep-Alive
  Content-Type: text/xml;
Cookie coming from aus3.mozilla.org attempted to set domain to aus2.mozilla.org
Length: 919 [text/xml]
Saving to: `update.xml'

     0K                                                       100%  535M=0s

13:39:11 (535 MB/s) - `update.xml' saved [919/919]


Got this response:
<?xml version="1.0"?>
<updates>
    <update type="minor" version="8.0 Beta" extensionVersion="8.0" buildID="20111011182523" detailsURL="https://www.mozilla.com/zu/firefox/8.0/releasenotes/">
        <patch type="complete" URL="http://stage-old.mozilla.org/pub/mozilla.org/firefox/nightly/8.0b3-candidates/build1/update/linux-i686/zu/firefox-8.0b3.complete.mar" hashFunction="SHA512" hashValue="19bf9de7cc2d8664147af90c919ba701e05616a21888705d294c4fd1cfff99f6b40473c970683093dc9f36926dcc013e76995c26733125cb1d3bb26ebb065d05" size="16480509"/>
        <patch type="partial" URL="http://stage-old.mozilla.org/pub/mozilla.org/firefox/nightly/8.0b3-candidates/build1/update/linux-i686/zu/firefox-8.0b2-8.0b3.partial.mar" hashFunction="SHA512" hashValue="463d9b70759e1b873dbeb67265646355600dfd0987c88d015159b07285570765b306ce09660829c77e868c77d312a35364a5f6b17217caff28faa497b18ee69b" size="840361"/>
    </update>
</updates>


X-Backend-Server might help us tell if one particular webhead is being a problem, and seeing update.xml will confirm we're getting an empty update.
Attachment #566627 - Flags: review?(bhearsum)
Attachment #566627 - Flags: review?(bhearsum) → review+
The debugging patch broke final verification by adding one of these lines for each aus query
 HTTP request sent, awaiting response... 
which then match in a 'grep HTTP' when we don't want them too. Adding the forward slash fixes it by only matching on the likes of HTTP/1.1. We don't need to worry about windows and slash-munging because all the verifications run on linux.
Attachment #567036 - Flags: review?(bhearsum)
Attachment #567036 - Flags: review?(bhearsum) → review+
I had a quick peek at an update verify log for 8.0b5 and found that there was at least three webheads serving empty snippets (01 and 05 and 08): http://buildbot-master08.build.sjc1.mozilla.com:8001/builders/release-mozilla-beta-win32_update_verify_5%2F10/builds/19/steps/run_script/logs/stdio
I saw 04 and 07 as well.
OK, we should ask IT if this could be an issue with the Zeus load balancer which is in front of the web heads.
I couldn't get AUS to give me an empty response on purpose, so I tested by adding this:

diff --git a/release/common/download_mars.sh b/release/common/download_mars.sh
--- a/release/common/download_mars.sh
+++ b/release/common/download_mars.sh
@@ -9,16 +9,22 @@ download_mars () {
     test_only="$3"
 
     max_tries=5
     try=1
     while [ "$try" -lt "$max_tries" ]; do
         echo "Using  $update_url"
         $retry wget --no-check-certificate -S -O update.xml $update_url
 
+        if [ "$RANDOM" -lt 30000 ]; then
+            echo "<?xml version=\"1.0\"?>" > update.xml
+            echo "<updates>" >> update.xml
+            echo "</updates>" >> update.xml
+        fi
+
         echo "Got this response:"
         cat update.xml
         # If the first line after <updates> is </updates> then we have an
         # empty snippet. Otherwise we're done
         if [ "$(grep -A1 '<updates>' update.xml | tail -1)" != "</updates>" ];
             break;
         fi
         echo "Empty response, sleeping"
Attachment #576631 - Flags: review?(nrthomas)
Comment on attachment 576631 [details] [diff] [review]
retry on empty responses

>+    max_tries=5
>+    try=1
>+    while [ "$try" -lt "$max_tries" ]; do

Use -le to get max_tries instead of max_tries-1, and add a comment like this
  # retrying until we get offered an update

>+        echo "Using  $update_url"
>+        $retry wget --no-check-certificate -S -O update.xml $update_url

Please add a comment like
  # retrying until AUS gives us any response at all
 

>+        if [ "$RANDOM" -lt 30000 ]; then
>+            echo "<?xml version=\"1.0\"?>" > update.xml
>+            echo "<updates>" >> update.xml
>+            echo "</updates>" >> update.xml
>+        fi

Please leave out the test code on landing.
Attachment #576631 - Flags: review?(nrthomas) → review+
Assignee: nobody → catlee
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: