Closed Bug 628796 Opened 9 years ago Closed 7 years ago

final verification logs should have a seperator between requests

Categories

(Release Engineering :: General, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhford, Assigned: pmoore)

References

Details

(Whiteboard: [automation][releases][simple])

Attachments

(3 files, 12 obsolete files)

1.66 MB, text/plain
Details
328.90 KB, text/plain
Details
21.76 KB, patch
bhearsum
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
It would be a lot easier to read these logs if there were visual separators between each url check.
OS: Linux → All
Priority: -- → P5
Hardware: x86_64 → All
Whiteboard: [automation][releases][simple]
Shell script called by automation is here:

https://hg.mozilla.org/build/tools/file/9be0576d239c/release/final-verification.sh

...which calls out to:

https://hg.mozilla.org/build/tools/file/9be0576d239c/release/updates/verify.sh

verify.sh is where the delimiter should go.

Here's an example of it being run:

http://buildbot-master30.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-release-linux64_final_verification/builds/1/steps/final_verification/logs/stdio

Should be pretty clear from the output why we'd like a delimiter of some sort. ;)

I would go even further and say that a delimiter+quick summary of the entry currently being eval-ed would be ideal. We could tell which combo was failing at a glance then. 

Bonus points for creating a list of failures and dumping a summary of them at the end of the log.
Assignee: nobody → pmoore
My progress on this can be tracked here:

https://github.com/petemoore/build-tools/tree/bug628796

I have discovered several additional issues with the verify procedure:

 1) Several urls are tested multiple times, e.g. take a look at the log file here:

        http://buildbot-master30.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-release-linux64_final_verification/builds/1/steps/final_verification/logs/stdio

    There are 1487 urls tested, but only 355 unique ones.

        $ cat stdio | grep '^Testing http' | wc -l
            1487
        $ cat stdio | grep '^Testing http' | sort -u | wc -l
             355
        $ 

    This is not because particular urls failed, and were tried multiple times - the multiple attempts for a single url are scattered throughout the log, rather than appearing in sequence of one another, as they would do if there were several retries for a failing url.

 2) The logging is unnecessarily verbose when the url tests pass (in this example above, 93047 lines of logging for only 355 unique urls) - instead, a single line for each successfully retrieved url would be sufficient, and in the case of failure, more logging could be provided. This could reduce the logging to around 1-2% of the current size, making it much more humanly readable, and just as useful.

 3) There are several options that can be passed to the verify.sh script, but in the tools project, only verify.sh -t is ever called - this means that unless there is code outside of the tools repository that is calling verify.sh, there is potentially a lot of code that is never called. I will leave this here for now, in case it is used by an external project, but I could not find any reference to it in tools, buildbotcustom or buildbot-configs.

 4) The verify.sh script is the only script in the tools project base that is sourcing the "common" scripts:

        release/common/download_mars.sh
        release/common/download_builds.sh
        release/common/check_updates.sh

    In general I am not a fan of sourcing libs containing functions in shell scripts, because it can be very difficult to debug if libraries accidentally include the same function names, or if they are including code outside of functions that set variables, etc - so that in a shell script that includes these scripts, it can be difficult to know what the current shell has been influenced by (executed code, change of environment variables, etc) because of something that was sourced. I think a cleaner solution is for the helper functions to be defined as executable shell scripts directly, so that they run in their own process with a clean environment, so that e.g. their environment variables cannot affect calling scripts. This is particularly important when writing libraries that may be used by multiple calling scripts - then it is even more important to have this separation. For example, rather than sourcing download_mars.sh and then later on calling the download_mars function, it would be better to directly execute download_mars.sh, and this script to receive the inputs it requires as parameters (currently defined by the function in the script), and then it can be used safely in isolation. This also has other benefits, for example that it can be called by a python script, or from a different programming language, since it can be directly executed in its own process. Otherwise, for example to call download_mars from python, you would need to first create a script to include the lib, and then execute the function, and then call this script e.g. from python.

    This should also not result in more code - because currently the function processes the inputs, and this would now just be moved "up" so that the script directly processes the input parameters.
Status: NEW → ASSIGNED
I've decided to refactor somewhat, to deal with the 3 issues above, and to simplify matters (currently the implementation totals 690 lines across 5 files - for such a simple task this is not necessary). My patch is committed on github (see my link in previous comment), I am currently testing it - assuming the tests run fine overnight on both Mac OS X (Darwin) and Linux, I'll attach the patch to this bug tomorrow, and submit for review.

Thanks,
Pete
Attached patch Proposed patch for tools repo (obsolete) — Splinter Review
It's quite a clean up - let me know how you feel about this. The new version I believe is slicker (no risk of multiple tests for the same url), cleaner, shorter, and the output is more appropriate I believe, and easier to digest. I believe we may also be able to remove several dependencies that the old implementation was using, since they do not seem to be referenced elsewhere in any codebase I have found - however, let's discuss, as I might not have all the entry points!

Additional: please note the new implementation works on both Mac and Linux (the old one was using both curl and wget, new one using curl only, sed expressions simplified, etc - so now fully compatible).

Thanks,
Pete
Attachment #728782 - Flags: review?(coop)
Whoops - attached old version. This is the correct one!
Attachment #728782 - Attachment is obsolete: true
Attachment #728782 - Flags: review?(coop)
Attachment #728784 - Flags: review?(coop)
Comment on attachment 728784 [details] [diff] [review]
accidentally attached old version of patch - this is updated version

Review of attachment 728784 [details] [diff] [review]:
-----------------------------------------------------------------

Changes look good.

Next step is to provide a side-by-side example of the old format output and new format output for the same set of updates. This will let people see how awesome the new format is (especially by comparison) but also has the added benefit of ensuring that your dev-master is correctly setup and able to run release builders.

It doesn't need to be run against a full set of updates. Indeed, it might be best tested against a user repo with a pruned update list that includes a few update combinations that have been altered (e.g. bogus locale) so that they would never work.

::: release/final-verification.sh
@@ +8,3 @@
>  if [ $# -lt 1 ]; then
> +    echo "Usage: $(basename "${0}") [list of update verify configs]" >&2
> +    exit 127

I don't care much provided the return code is > 0, but isn't 127 "reserved" for |command not found|?
Attachment #728784 - Flags: review?(coop) → review+
cc-ing Ben because he'll probably be very interested in the new format.
Hi Chris,

I've made the change you suggested regarding the exit code, and have considerably improved logging and performance.

Now the script is forks multiple (a configurable number) of processes to handle curling the urls. This has resulted in massive performance improvements.

I will also attach a log file from an example run.

There is also now a help option with the command, and the command is more careful in checking the input parameters, to make sure the inputs are valid before running.

There is also a new "KEY STATS" section at the end to give an overview of the run.

Let me know if you have any questions. Will attach the log file now...

Pete
Attachment #729663 - Flags: review?(coop)
Attachment #729663 - Flags: review?(bhearsum)
Attached file example log output from new script (obsolete) —
This is an example of the new logging.

The previous version looked like this:
http://buildbot-master30.srv.releng.scl3.mozilla.com:8001/builders/release-mozilla-release-linux64_final_verification/builds/1/steps/final_verification/logs/stdio

Please note the new version runs considerably faster (89 seconds compared to 10 minutes). You can see this from comparing the output of the above url with the content of this attachment (bug628796_example_output.txt).
Attachment #728784 - Attachment is obsolete: true
Attachment #729666 - Flags: review?(coop)
Attachment #729666 - Flags: review?(bhearsum)
Spotted that the mar download urls were not properly decoded, e.g.:

    http://download.mozilla.org/?product=firefox-19.0.2-complete&os=linux64&lang=ach&force=1

should have been:

    http://download.mozilla.org/?product=firefox-19.0.2-complete&os=linux64&lang=ach&force=1

This has been addressed in the updated version of this patch.
Attachment #729663 - Attachment is obsolete: true
Attachment #729663 - Flags: review?(coop)
Attachment #729663 - Flags: review?(bhearsum)
Attachment #730125 - Flags: review?(coop)
Attachment #730125 - Flags: review?(bhearsum)
Attached file Updated log file example (obsolete) —
Due to the fix for '&' => '&' in mar download urls (see attachment 730125 [details] [diff] [review]), I have attached a new example log file that demonstrates the new fix in action.
Attachment #729666 - Attachment is obsolete: true
Attachment #729666 - Flags: review?(coop)
Attachment #729666 - Flags: review?(bhearsum)
Attachment #730129 - Flags: review?(coop)
Attachment #730129 - Flags: review?(bhearsum)
Comment on attachment 730125 [details] [diff] [review]
Fix for '&' appearing in mar download urls

Review of attachment 730125 [details] [diff] [review]:
-----------------------------------------------------------------

It'd be really great to get some sample output with failures. Based on my reading of the code I'm not sure whether or not we'll get the information we want for debugging. It's really important for us to get full headers when we have failures (including headers from 301s or other redirects). Ideally we'd only have that information output only upon failures, but if it's always there (and clearly separated, like this bug asks for), that's fine too.

I'd prefer to have line-by-line output while MAR URLs are being gathered. Without it, it's impossible to be sure that every update URL was tested.

I really love the optimization of only checking each MAR once -- it makes me wonder why we never thought to do it before!

This code has gotten complex to the point where it might be worthwhile converting it to Python, but if you're happy with it in bash that's fine too.
Attachment #730125 - Flags: review?(bhearsum) → feedback+
Attachment #730129 - Flags: review?(bhearsum)
Hi Ben,

You are quite right - it would be good to have some log examples with errors, so I have been busy doing the following:

1) Creating some .cfg file entries that are fake (e.g. a fake aus_server, false locale, unusual patch-type) and also setting up a "dummy" update.xml server that serves bad urls for mar files (invalid version numbers from the mozilla server, also urls that redirect to a different site, and urls that are simply invalid)
2) Running both the old and new version of the final-verification against the same modified .cfg file, to compare results.

This will cover as many test cases as possible, and my aim is to make sure that both give identical results regarding their analysis of which steps failed, and why.

I have also increased the logging output not to only show download mar urls, but also all the tested update.xml urls (derived simply from the .cfg files).

I will apply a new patch with this as soon as it is ready and I've finished the testing. Then I will add a new attachment, which is the equivalent output for the same "dummy run" using the old final-verification.sh script.

Watch this space! :)

Pete
Comment on attachment 730129 [details]
Updated log file example

Looks good to me. Pinging Ben for additional feedback.

What causes the 3 minute jump after the "Starting stopwatch..." message?
Attachment #730129 - Flags: review?(coop)
Attachment #730129 - Flags: review+
Attachment #730129 - Flags: feedback?(bhearsum)
Attachment #730125 - Flags: review?(coop) → review+
Comment on attachment 730129 [details]
Updated log file example

I'm going to wait for the updated log to give my thoughts. Pete, let me know if you need a hand forcing failures or anything.
Attachment #730129 - Flags: feedback?(bhearsum)
Hi guys,

We've had a couple of holiday days here in Germany - back on this now. To answer above questions:

1) The mar urls are being tested in parallel from multiple processes. Rather than output from these various processes concurrently, which might be confusing due to the strange ordering of entries, I sort the urls when the subprocesses have all completed, so that the entries are all sorted for easy look-up, which also keeps the failures listed together and the successes too. The sort can only complete when the individual subprocesses have completed, which is why it appears there is a delay, since the output arrives after all subprocesses have completed. I can remove the sort if you like, so you see them as they occur, just be aware then the failures and successes will be mixed in together, and the output may potentially be in a bit of a jumbled order.

These are the lines that are sorted:

<timestamp>:  <mar url> succeeded
<timestamp>:  FAILURE: Could not retrieve http header for mar file from <mar url>

2) I've now set up various test cases to catch as many possibilities as possible, which are currently running, and I will provide a side-by-side comparison of the results from before the change vs. after.

Please note I attempted this on Thursday, but accidentally ran:

final-verification.sh <other config files> update.cfg

Please note this created a processes which filled up the filesystem and brought down various services on the machine wihtin a couple of hours, since "update.cfg" was listed as an argument. My apologies for this. The new version of final-verification.sh will not do this. Due to this unforeseen error, this caused a little delay to performing the full testing.
New changes include:

  * Now shows HTTP 302 redirects in logging
  * Logs in real time, rather than batching up and sorting output
  * Now checks the size of the mar file is correct
  * Now outputs update.xml urls (not just mar file urls)
  * Now outputs a summary report at the end of the log file, for every failure, which explains what failed, and why it was being tested, right back to the original line(s) of the .cfg file(s) that led to this test to run
Attachment #730125 - Attachment is obsolete: true
Attachment #732839 - Flags: review?(bhearsum)
Attachment #732839 - Flags: feedback?(coop)
Attached file Logging example, with patch applied (obsolete) —
Attachment #730129 - Attachment is obsolete: true
Attachment #732854 - Flags: review?(bhearsum)
Attachment #732854 - Flags: feedback?(coop)
Now with correct attachment ...
Attachment #732839 - Attachment is obsolete: true
Attachment #732839 - Flags: review?(bhearsum)
Attachment #732839 - Flags: feedback?(coop)
Attachment #732860 - Flags: review?(bhearsum)
Attachment #732860 - Flags: feedback?(coop)
Attachment #732854 - Flags: review?(bhearsum)
Attachment #732854 - Flags: feedback?(coop)
Comment on attachment 732860 [details] [diff] [review]
Proposed patch for build tools repo

Review of attachment 732860 [details] [diff] [review]:
-----------------------------------------------------------------

All of the output from the latest log looks good, but I'm still not sure what we're going to get when an update points to a MAR that doesn't exist. That's actually the case I'm most concerned about. Sometimes we hit 500s, 403s, 404s - and it's important for us to know each step we took to get there (which load balancer(s) we passed through, which headers we got, the final node, etc.). Do you know what will happen in that case? Let me know if you have trouble constructing a useful test case for that.
(In reply to Ben Hearsum [:bhearsum] from comment #21)
> All of the output from the latest log looks good, but I'm still not sure
> what we're going to get when an update points to a MAR that doesn't exist.
> That's actually the case I'm most concerned about. Sometimes we hit 500s,
> 403s, 404s - and it's important for us to know each step we took to get
> there (which load balancer(s) we passed through, which headers we got, the
> final node, etc.). Do you know what will happen in that case? Let me know if
> you have trouble constructing a useful test case for that.

Just to be clear here, Ben, would this be a regression from the current state where all that info is discernible from the current log, or is that an idealized new state?
(In reply to Chris Cooper [:coop] from comment #22)
> (In reply to Ben Hearsum [:bhearsum] from comment #21)
> > All of the output from the latest log looks good, but I'm still not sure
> > what we're going to get when an update points to a MAR that doesn't exist.
> > That's actually the case I'm most concerned about. Sometimes we hit 500s,
> > 403s, 404s - and it's important for us to know each step we took to get
> > there (which load balancer(s) we passed through, which headers we got, the
> > final node, etc.). Do you know what will happen in that case? Let me know if
> > you have trouble constructing a useful test case for that.
> 
> Just to be clear here, Ben, would this be a regression from the current
> state where all that info is discernible from the current log, or is that an
> idealized new state?

That'd be a regression. Currently we get things like:
Using  https://aus3.mozilla.org/update/1/Firefox/10.0/20111228055358/Darwin_x86_64-gcc3-u-i386-x86_64/mr/releasetest/update.xml?force=1
retry: Calling <function run_with_timeout at 0x7f5f1b0c2230> with args: (['wget', '--progress=dot:mega', '--no-check-certificate', '--server-response', '-O', 'update.xml', 'https://aus3.mozilla.org/update/1/Firefox/10.0/20111228055358/Darwin_x86_64-gcc3-u-i386-x86_64/mr/releasetest/update.xml?force=1'], 300, None, None, False, True), kwargs: {}, attempt #1
--2013-01-31 13:26:30--  https://aus3.mozilla.org/update/1/Firefox/10.0/20111228055358/Darwin_x86_64-gcc3-u-i386-x86_64/mr/releasetest/update.xml?force=1
Executing: ['wget', '--progress=dot:mega', '--no-check-certificate', '--server-response', '-O', 'update.xml', 'https://aus3.mozilla.org/update/1/Firefox/10.0/20111228055358/Darwin_x86_64-gcc3-u-i386-x86_64/mr/releasetest/update.xml?force=1']
Resolving aus3.mozilla.org... 63.245.217.44
Connecting to aus3.mozilla.org|63.245.217.44|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Server: Apache
  X-Backend-Server: pp-app-dist03
  Cache-Control: no-store, must-revalidate, post-check=0, pre-check=0, private
  Content-Type: text/xml;
  Date: Thu, 31 Jan 2013 21:26:31 GMT
  Keep-Alive: timeout=5, max=975
  Connection: Keep-Alive
  Set-Cookie: aus2a=63.245.214.82.1359667591.0366; expires=Thu, 01-Feb-2018 02:30:21 GMT; path=/; domain=aus2.mozilla.org
  X-Powered-By: PHP/5.1.6
  X-Cache-Info: not cacheable; response specified "Cache-Control: no-store"
  Content-Length: 575
Cookie coming from aus3.mozilla.org attempted to set domain to aus3.mozilla.org
Length: 575 [text/xml]
Saving to: “update.xml”

     0K                                                      100% 20.7M=0s

2013-01-31 13:26:31 (20.7 MB/s) - “update.xml” saved [575/575]

Got this response:
<?xml version="1.0"?>
<updates>
    <update type="minor" displayVersion="19.0 Beta 4" appVersion="19.0" platformVersion="19.0" buildID="20130130080006" detailsURL="https://www.mozilla.com/mr/firefox/19.0/releasenotes/" actions="silent">
        <patch type="complete" URL="http://download.mozilla.org/?product=firefox-19.0b4-complete&amp;os=osx&amp;lang=mr&amp;force=1" hashFunction="SHA512" hashValue="733d23dbc0b346c9b943e0abd2f1d34c772ad737ddcfba579485248813ddcfde6ef809af0a22ce3ae922389eb23869ff48e5cef19385be9c3e52cee8c2b9b633" size="38277609"/>
    </update>
</updates>

Testing http://download.mozilla.org/?product=firefox-19.0b4-complete&os=osx&lang=mr&force=1
HTTP/1.1 302 Found
Server: Apache
X-Backend-Server: bouncer6.webapp.phx1.mozilla.com
Cache-Control: max-age=15
Content-Type: text/html; charset=UTF-8
Date: Thu, 31 Jan 2013 21:26:31 GMT
Location: http://download.cdn.mozilla.net/pub/mozilla.org/firefox/releases/19.0b4/update/mac/mr/firefox-19.0b4.complete.mar
Transfer-Encoding: chunked
Connection: Keep-Alive
Set-Cookie: dmo=10.8.81.218.1359667591594158; path=/; expires=Fri, 31-Jan-14 21:26:31 GMT
X-Cache-Info: caching

FAIL: download_mars returned non-zero exit code: 56
Comment on attachment 732860 [details] [diff] [review]
Proposed patch for build tools repo

Review of attachment 732860 [details] [diff] [review]:
-----------------------------------------------------------------

Just a few nits, but f+ overall. You've already realized that bhearsum should be doing the full review here.

I'm no stranger to shell scripts, but the longer this script becomes, the more I want to see it ported to mozharness + python libraries.

::: release/final-verification.sh
@@ +124,5 @@
> +# entries related to the exception that occured
> +# e.g.
> +#
> +# UPDATE_XML_UNAVAILABLE http://fake.aus.server.org/update/1/Firefox/4.0b9/20110110191600/Linux_x86-gcc3/uk/releasetest/update.xml?force=1
> +# PATCH_TYPE_MISSING http://revolutioner.org/update/1/Firefox/4.0b15/20101214164501/Linux_x86-gcc3/zh-CN/releasetest/update.xml?force=1 complete 

Trailing ws

@@ +208,5 @@
> +    while read failure_code entry1 entry2 entry3 entry4
> +    do
> +        case "${failure_code}" in
> +
> +            UPDATE_XML_UNAVAILABLE) 

Trailing ws

@@ +217,5 @@
> +                show_cfg_file_entries "${update_xml_url}"
> +                log ""
> +                ;;
> +
> +            UPDATE_XML_REDIRECT_FAILED) 

Trailing ws

@@ +253,5 @@
> +                show_update_xml_entries "${mar_url}"
> +                log ""
> +                ;;
> +
> +            MAR_FILE_WRONG_SIZE) 

Trailing ws

::: release/get-update-xml.sh
@@ +11,5 @@
> +    update_xml_actual_url="$(cat "${update_xml_headers}" | sed "s/$(printf '\r')//" | sed -n 's/^Location: //p')"
> +    [ -n "${update_xml_actual_url}" ] && update_xml_url_with_redirects="${update_xml_url} => ${update_xml_actual_url}" || update_xml_url_with_redirects="${update_xml_url}"
> +    echo "$(date):  Downloaded update.xml file from ${update_xml_url_with_redirects}" >&2
> +    for patch_type in ${patch_types//,/ }
> +    do  

Trailing ws

@@ +19,5 @@
> +            echo "$(date):  FAILURE: No patch type '${patch_type}' found in update.xml from ${update_xml_url_with_redirects}" >&2
> +            echo "PATCH_TYPE_MISSING ${update_xml_url} ${patch_type} ${update_xml_actual_url}" >> "${failures}"
> +        else
> +            echo "${mar_url_and_size}"
> +            echo "$(date):  Successfully extracted mar url and mar file size for patch type '${patch_type}' from ${update_xml_url_with_redirects}" >&2

Is it worthwhile using the same report structure for success as for failure, i.e. "$(date): SUCCESS: extracted ..." ?

::: release/test-mar-url.sh
@@ +13,5 @@
> +[ -n "${mar_actual_url}" ] && mar_url_with_redirects="${mar_url} => ${mar_actual_url}" || mar_url_with_redirects="${mar_url}"
> +
> +if [ "${mar_actual_size}" == "${mar_required_size}" ]
> +then
> +    echo "$(date):  Mar file ${mar_url_with_redirects} available with correct size (${mar_actual_size} bytes)" >&2

Is it worthwhile using the same report structure for success as for failure, i.e. "$(date): SUCCESS: Mar file ..." ?
Attachment #732860 - Flags: feedback?(coop) → feedback+
Attached file Logging example, with patch applied (obsolete) —
See updated log file, which now:

1) shows curl exit codes for failures
2) shows http headers, when there are failures
3) numbers the failures as it lists them at end of log file, in case there are many, it makes it easier to communicate with people about an error if they each have a unique number (e.g. "see FAILURE 16 in log file XXXXX")
Attachment #732840 - Attachment is obsolete: true
Updated patch, to include latest code changes.
Attachment #732860 - Attachment is obsolete: true
Attachment #732860 - Flags: review?(bhearsum)
Attachment #734680 - Flags: review?(bhearsum)
Attachment #734680 - Flags: feedback?(coop)
Attachment #734679 - Attachment mime type: text/x-log → text/plain
Attachment #734679 - Attachment is obsolete: true
Attachment #734680 - Attachment is obsolete: true
Attachment #734680 - Flags: review?(bhearsum)
Attachment #734680 - Flags: feedback?(coop)
Attachment #734715 - Flags: review?(bhearsum)
Attachment #734715 - Flags: feedback?(coop)
Comment on attachment 734715 [details] [diff] [review]
Proposed patch for build tools repo

Review of attachment 734715 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the very detailed log examples - it's super helpful in reviewing this. I'm happy with the output now. I share coop's comment about this getting a bit long for shell, but I don't think we need to block on this. I think we'll probably look at a lot of our update related tools, including the patcher/update verify configs themselves, after balrog is in production.

I ran this locally and had some interesting results:
* I got a few failures that look like they were caused by download.mozilla.org returning a bad response. I suspect this was because of too many simultaneous requests. When I bumped down the ||ization to 64 procs I got no failures. I'm a bit concerned about what will happen in production, because we have 4 builders (one per platform), so we'll end up with ||ization * 4 requests at the same time. Given that this new final verify script does ||ization by itself, I think we should merge those 4 builders into a single one. That shouldn't block us from landing this, though. If you want to have a look at this part just let me know and I can show you the appropriate code. I filed bug 859464 for this.
* HTTP headers didn't show up for me. Instead I got:
Lun Abr  8 15:34:24 EDT 2013:      The HTTP headers were:
sed: can't read : No such file or directory

Maybe something related to sed differences between mac and linux? It might also be something else entirely, curl didn't seem to exit nicely. Here's a full failure message:
Lun Abr  8 15:34:24 EDT 2013:  FAILURE 1: Mar file is wrong size
Lun Abr  8 15:34:24 EDT 2013:  
Lun Abr  8 15:34:24 EDT 2013:      Mar file url: http://download.mozilla.org/?product=firefox-20.0-partial-16.0.2&os=osx&lang=or&force=1
Lun Abr  8 15:34:24 EDT 2013:      The http header of the mar file url says that the mar file is 25 bytes.
Lun Abr  8 15:34:24 EDT 2013:      One or more of the following update.xml file(s) says that the file should be 35354328 bytes.
Lun Abr  8 15:34:24 EDT 2013:  
Lun Abr  8 15:34:24 EDT 2013:      These are the update xml file(s) that referenced this mar:
Lun Abr  8 15:34:24 EDT 2013:          https://aus3.mozilla.org/update/1/Firefox/16.0.2/20121024073032/Darwin_x86_64-gcc3-u-i386-x86_64/or/releasetest/update.xml?force=1
Lun Abr  8 15:34:24 EDT 2013:              This contained an entry for:
Lun Abr  8 15:34:24 EDT 2013:                  patch type: partial
Lun Abr  8 15:34:24 EDT 2013:                  mar size: 35354328
Lun Abr  8 15:34:24 EDT 2013:                  mar url: http://download.mozilla.org/?product=firefox-20.0-partial-16.0.2&os=osx&lang=or&force=1
Lun Abr  8 15:34:24 EDT 2013:              The update.xml url above was retrieved because of the following cfg file entries:
Lun Abr  8 15:34:24 EDT 2013:                  mozRelease-firefox-mac64.cfg line 15: release="16.0.2" product="Firefox" platform="Darwin_x86_64-gcc3-u-i386-x86_64" build_id="20121024073032" locales="ach af ak ar as ast be bg bn-BD bn-IN br bs ca cs csb cy da de el en-GB en-US en-ZA eo es-AR es-CL es-ES es-MX et eu fa ff fi fr fy-NL ga-IE gd gl gu-IN he hi-IN hr hu hy-AM id is it ja-JP-mac kk km kn ko ku lg lij lt lv mai mk ml mr nb-NO nl nn-NO nso or pa-IN pl pt-BR pt-PT rm ro ru si sk sl son sq sr sv-SE ta ta-LK te th tr uk vi zh-CN zh-TW zu" channel="betatest" patch_types="complete partial" from="/firefox/releases/16.0.2/mac/%locale%/Firefox 16.0.2.dmg" ftp_server_from="http://stage.mozilla.org/pub/mozilla.org" ftp_server_to="http://stage.mozilla.org/pub/mozilla.org"
Lun Abr  8 15:34:24 EDT 2013:  
Lun Abr  8 15:34:24 EDT 2013:      Curl returned exit code: 
Lun Abr  8 15:34:24 EDT 2013:  
Lun Abr  8 15:34:24 EDT 2013:      The HTTP headers were:
sed: can't read : No such file or directory
Lun Abr  8 15:34:24 EDT 2013:  
Lun Abr  8 15:34:24 EDT 2013:  ====================================
Hi Ben,

Problem solved. This error comes when there is a HTTP 503 error first. This you get when the server is busy, and then it waits, and retries. The problem was simply this line

test-mar-url.sh
===============

 13 mar_actual_size="$(sed -e "s/$(printf '\r')//" -n -e 's/^Content-Length: //p' "${mar_header_file}")"

This assumed there was only one line matching '^Content-Length: ' in the returned headers of the curl statement - but as you see below, when there is a HTTP 503 error (which you only see when running a high number of parallel processes - so this explains why it was not picked up before) - you get two lines that match, the first the Content-Length for the HTTP 503, and then the Content-Length of the mar file, as below (log file you sent me):


252558 temp file (/tmp/http_header.CE89nl):
252559 + cat /tmp/http_header.CE89nl
252560 HTTP/1.0 503 Server Too Busy^M
252561 Content-Length: 25^M     <------ Content-Length of HTTP 503 error
252562 Content-Type: text/html^M
252563 ^M
252564 HTTP/1.1 302 Found^M
252565 Server: Apache^M
252566 X-Backend-Server: bouncer8.webapp.phx1.mozilla.com^M
252567 Cache-Control: max-age=15^M
252568 Content-Type: text/html; charset=UTF-8^M
252569 Date: Mon, 08 Apr 2013 21:31:51 GMT^M
252570 Location: http://download.cdn.mozilla.net/pub/mozilla.org/firefox/releases/20.0/update/mac/ach/firefox-18.0.2-20.0.partial.mar^M
252571 Transfer-Encoding: chunked^M
252572 Connection: Keep-Alive^M
252573 Set-Cookie: dmo=10.8.81.216.1365456711828396; path=/; expires=Tue, 08-Apr-14 21:31:51 GMT^M
252574 X-Cache-Info: caching^M
252575 ^M
252576 HTTP/1.1 200 OK^M
252577 Server: Apache^M
252578 X-Backend-Server: ftp2.dmz.scl3.mozilla.com^M
252579 Content-Type: application/octet-stream^M
252580 Accept-Ranges: bytes^M
252581 Access-Control-Allow-Origin: *^M
252582 ETag: "1f0520c-140cff5-4d8dcc1ff9580"^M
252583 Last-Modified: Wed, 27 Mar 2013 00:02:46 GMT^M
252584 X-Cache-Info: caching^M
252585 Content-Length: 21024757^M     <----- Content-Length of mar file
252586 Cache-Control: max-age=342778^M
252587 Expires: Fri, 12 Apr 2013 20:44:50 GMT^M
252588 Date: Mon, 08 Apr 2013 21:31:52 GMT^M
252589 Connection: keep-alive^M
252590 ^M

adding a " | tail -1" will ensure that the last one is always picked up, which will be the final attempt made to retreive the mar file header that curl makes.

I'll apply the fix, and attach a new patch shortly.

Nice testing! :D

Pete
Bug fix for HTTP 503 errors (see comment 29 above)
This should also fix the HTTP headers getting output in the case the mar file genuinely is the wrong size (no more "sed: can't read : No such file or directory"). This secondary error was a consequence of the first error, that said the mar file was the wrong size, when it wasn't.
Attachment #734715 - Attachment is obsolete: true
Attachment #734715 - Flags: review?(bhearsum)
Attachment #734715 - Flags: feedback?(coop)
Attachment #735120 - Flags: review?(bhearsum)
Comment on attachment 735120 [details] [diff] [review]
Proposed patch for build tools repo

This looks good now. We need to hold off on landing it until 20.0.1 ships, though. Should be able to get this in for next week's Beta.
Attachment #735120 - Flags: review?(bhearsum) → review+
Comment on attachment 735120 [details] [diff] [review]
Proposed patch for build tools repo

With the 20.0.1 releases now out of the way, I've landed this.
Attachment #735120 - Flags: checked-in+
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Blocks: 863428
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.