Closed Bug 966637 Opened 10 years ago Closed 10 years ago

Purge CDNs of firefox/releases/24.3.0esr

Categories

(Infrastructure & Operations Graveyard :: WebOps: Product Delivery, task)

task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Assigned: cturra)

References

Details

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #901734 +++

We had to rebuild 24.3.0esr for a code change, after it became visible to the CDNs. Please purge the path firefox/releases/24.3.0esr from all the CDNs.
:rail - as requested, i have submitted purge requests through our CDN's, i expect them to take up to 30 minutes to complete.
Thanks!
Assignee: server-ops-webops → cturra
purges have been complete for a bit now :) marking this as r/fixed.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Hmm, I'm still getting old cached data from some of the ftpX.dmz.scl3.mozilla.com machines or something that caches them. This is one of the 8 failures detected by automation. I checked the file size on stage.m.o and it's the same as the expected file size below.

Sun Feb  2 05:49:27 PST 2014:  ====================================
Sun Feb  2 05:49:27 PST 2014:  
Sun Feb  2 05:49:27 PST 2014:  FAILURE 8: Mar file is wrong size
Sun Feb  2 05:49:27 PST 2014:  
Sun Feb  2 05:49:27 PST 2014:      Mar file url: http://download.mozilla.org/?product=firefox-24.3.0esr-partial-24.2.0esr&os=win&lang=zh-TW&force=1
Sun Feb  2 05:49:27 PST 2014:      This redirected to: http://download.cdn.mozilla.net/pub/firefox/releases/24.3.0esr/update/win32/zh-TW/firefox-24.2.0esr-24.3.0esr.partial.mar
Sun Feb  2 05:49:27 PST 2014:      The http header of the mar file url says that the mar file is 3594381 bytes.
Sun Feb  2 05:49:27 PST 2014:      One or more of the following update.xml file(s) says that the file should be 3612284 bytes.
Sun Feb  2 05:49:27 PST 2014:  
Sun Feb  2 05:49:27 PST 2014:      These are the update xml file(s) that referenced this mar:
Sun Feb  2 05:49:27 PST 2014:          https://aus3.mozilla.org/update/1/Firefox/24.2.0/20131205180928/WINNT_x86-msvc/zh-TW/releasetest/update.xml?force=1
Sun Feb  2 05:49:27 PST 2014:              This contained an entry for:
Sun Feb  2 05:49:27 PST 2014:                  patch type: partial
Sun Feb  2 05:49:27 PST 2014:                  mar size: 3612284
Sun Feb  2 05:49:27 PST 2014:                  mar url: http://download.mozilla.org/?product=firefox-24.3.0esr-partial-24.2.0esr&os=win&lang=zh-TW&force=1
Sun Feb  2 05:49:27 PST 2014:              The update.xml url above was retrieved because of the following cfg file entries:
Sun Feb  2 05:49:27 PST 2014:                  mozEsr24-firefox-win32.cfg line 1: release="24.2.0" product="Firefox" platform="WINNT_x86-msvc" build_id="20131205180928" locales="ach af ak ar as ast be bg bn-BD bn-IN br bs ca cs csb cy da de el en-GB en-US en-ZA eo es-AR es-CL es-ES es-MX et eu fa ff fi fr fy-NL ga-IE gd gl gu-IN he hi-IN hr hu hy-AM id is it ja kk km kn ko ku lg lij lt lv mai mk ml mr nb-NO nl nn-NO nso or pa-IN pl pt-BR pt-PT rm ro ru si sk sl son sq sr sv-SE ta ta-LK te th tr uk vi zh-CN zh-TW zu" channel="esrtest" patch_types="complete partial" from="/firefox/releases/24.2.0esr/win32/%locale%/Firefox Setup 24.2.0esr.exe" aus_server="https://aus3.mozilla.org" ftp_server_from="http://stage.mozilla.org/pub/mozilla.org" ftp_server_to="http://stage.mozilla.org/pub/mozilla.org" to="/firefox/candidates/24.3.0esr-candidates/build2/win32/%locale%/Firefox Setup 24.3.0esr.exe"
Sun Feb  2 05:49:27 PST 2014:  
Sun Feb  2 05:49:27 PST 2014:      Curl returned exit code: 0
Sun Feb  2 05:49:27 PST 2014:  
Sun Feb  2 05:49:27 PST 2014:      The HTTP headers were:
Sun Feb  2 05:49:27 PST 2014:          HTTP/1.1 302 Found
Sun Feb  2 05:49:27 PST 2014:          Server: Apache
Sun Feb  2 05:49:27 PST 2014:          X-Backend-Server: bouncer1.webapp.phx1.mozilla.com
Sun Feb  2 05:49:27 PST 2014:          Cache-Control: max-age=15
Sun Feb  2 05:49:27 PST 2014:          Content-Type: text/html; charset=UTF-8
Sun Feb  2 05:49:27 PST 2014:          Date: Sun, 02 Feb 2014 13:49:21 GMT
Sun Feb  2 05:49:27 PST 2014:          Location: http://download.cdn.mozilla.net/pub/firefox/releases/24.3.0esr/update/win32/zh-TW/firefox-24.2.0esr-24.3.0esr.partial.mar
Sun Feb  2 05:49:27 PST 2014:          Transfer-Encoding: chunked
Sun Feb  2 05:49:27 PST 2014:          Connection: Keep-Alive
Sun Feb  2 05:49:27 PST 2014:          Set-Cookie: dmo=10.8.81.218.1391348961763369; path=/; expires=Mon, 02-Feb-15 13:49:21 GMT
Sun Feb  2 05:49:27 PST 2014:          X-Cache-Info: caching
Sun Feb  2 05:49:27 PST 2014:          
Sun Feb  2 05:49:27 PST 2014:          HTTP/1.1 200 OK
Sun Feb  2 05:49:27 PST 2014:          Server: Apache
Sun Feb  2 05:49:27 PST 2014:          X-Backend-Server: ftp5.dmz.scl3.mozilla.com
Sun Feb  2 05:49:27 PST 2014:          Content-Type: application/octet-stream
Sun Feb  2 05:49:27 PST 2014:          Accept-Ranges: bytes
Sun Feb  2 05:49:27 PST 2014:          Access-Control-Allow-Origin: *
Sun Feb  2 05:49:27 PST 2014:          ETag: "5ea988b-36d88d-4f116c07043c0"
Sun Feb  2 05:49:27 PST 2014:          Last-Modified: Wed, 29 Jan 2014 07:18:47 GMT
Sun Feb  2 05:49:27 PST 2014:          Content-Length: 3594381
Sun Feb  2 05:49:27 PST 2014:          X-Cache-Info: cached
Sun Feb  2 05:49:27 PST 2014:          Cache-Control: max-age=287755
Sun Feb  2 05:49:27 PST 2014:          Expires: Wed, 05 Feb 2014 21:45:17 GMT
Sun Feb  2 05:49:27 PST 2014:          Date: Sun, 02 Feb 2014 13:49:22 GMT
Sun Feb  2 05:49:27 PST 2014:          Connection: keep-alive
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Just to make it fun, the headers the CDN provides are mixture of their own and what ftp originally served (eg X-Backend-Server). I don't think we set anything so long for Cache-Control though, I get 300 for http://ftp.m.o/<that_path>.
looks like i had submitted these purge requests for an incomplete path. i used /firefox/releases/24.3.0esr instead of /pub/firefox/releases/24.3.0esr.

i have resubmitted these purge requests this morning through akamai, edgecast and highlands.
i just got a note back from akamai (who servers a majority of our download cdn traffic currently) that my latest purge is complete. :rail - can you please give this another test for me?
Flags: needinfo?(rail)
in progress...
Still getting errors :/ Do you want me to attach the error log?
Flags: needinfo?(rail)
You can skip dow to the bottom of the log file to see the summary with failures and headers.
well, we do have many layers of caching. i have resubmitted the purge request through our CDNs with a forced cache purge on our load balancers. i suspect we were serving cached content on the origin side. 

sorry for the baby steps here, but i hope this latest purge gets things sorted. i will report back when these purges are complete.
No worries and thanks a lot for your help!
okay, all the purges should be complete. lets give this another test?
Flags: needinfo?(rail)
without wiping /all/ caches completely, i've been pretty aggressive with my purges. can you please review that the content being served out of download-origin.cdn.mozilla.net are the expected byte size(s)?
Flags: needinfo?(rail)
Does that wipe include the CDNs too ? I've seen this:

# origin-like but not the actual origin, from people in scl3
$ curl -sI http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/24.3.0esr/update/mac/zh-TW/firefox-24.3.0esr.complete.mar | egrep 'Last-Modified|Content-Length'
Last-Modified: Sat, 01 Feb 2014 00:12:04 GMT
Content-Length: 44344063

# the CDN, requesting from people in scl3
$ curl -sI http://download.cdn.mozilla.net/pub/firefox/releases/24.3.0esr/update/mac/zh-TW/firefox-24.3.0esr.complete.mar | egrep 'Last-Modified|Content-Length'
Last-Modified: Wed, 29 Jan 2014 03:10:47 GMT
Content-Length: 44345540

# the CDN from my home, out in the internet boonies
Last-Modified: Sat, 01 Feb 2014 00:12:04 GMT
Content-Length: 44344063

Getting Akamai for both CDN requests (based on nslookup), but obviously different parts of their network.
Also, I don't know if jakem translated the path given in bug 901734, but it has an extra 'mozilla.org' in it.
at this point, i have purged the following:

 1. all *24.3.0esr* path matches on our load balancers (download-origin)
 2. all /pub/firefox/releases/24.3.0esr directories recursively at akamai, edgecast and highwinds


(In reply to Nick Thomas [:nthomas] from comment #17)
> 
> # the CDN, requesting from people in scl3
> $ curl -sI
> http://download.cdn.mozilla.net/pub/firefox/releases/24.3.0esr/update/mac/zh-
> TW/firefox-24.3.0esr.complete.mar | egrep 'Last-Modified|Content-Length'
> Last-Modified: Wed, 29 Jan 2014 03:10:47 GMT
> Content-Length: 44345540

hmm.. i am getting something different than you there:

[cturra@people1.dmz.scl3 ~]$ date
Mon Feb  3 14:34:32 PST 2014
[cturra@people1.dmz.scl3 ~]$ curl -sI http://download.cdn.mozilla.net/pub/mozilla.org/firefox/releases/24.3.0esr/update/mac/zh-TW/firefox-24.3.0esr.complete.mar | grep "Content-Length"
Content-Length: 44344063


(In reply to Nick Thomas [:nthomas] from comment #18)
> Also, I don't know if jakem translated the path given in bug 901734, but it
> has an extra 'mozilla.org' in it.

looks like there is rewrite rule for this on the ftp cluster.

  RewriteEngine on
  RewriteRule ^/pub/mozilla\.org.* - [L]
  RewriteRule ^/pub/(.*)$ https://ftp.mozilla.org/pub/mozilla.org/$1 [R]
(In reply to Chris Turra [:cturra] from comment #19)
> hmm.. i am getting something different than you there:
> 
> [cturra@people1.dmz.scl3 ~]$ date
> Mon Feb  3 14:34:32 PST 2014
> [cturra@people1.dmz.scl3 ~]$ curl -sI
> http://download.cdn.mozilla.net/pub/mozilla.org/firefox/releases/24.3.0esr/
> update/mac/zh-TW/firefox-24.3.0esr.complete.mar | grep "Content-Length"
> Content-Length: 44344063

I reran mine and got the wrong size again, but now it's consistently 44344063. I'm gonna wave my hands and say we were flopping between different nodes depending on the exact when requests were made, and now Akamai has actioned the purge request.

Will rerun our test in a while, to give the CDN purge a little more time.
Still got a failure, slightly different:
http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/24.3.0esr-candidates/build2/logs/release-mozilla-esr24-final_verification-bm82-build1-build0.txt.gz

It's a request that is served by edgecast,
http://download.cdn.mozilla.net/pub/firefox/releases/24.3.0esr/update/linux-x86_64/ca/firefox-24.3.0esr.complete.mar
returning the wrong size. Repeating this on the same machine (bld-centos6-hp-017.build.scl1.m.c) gets me results that change from request to request. 

Can this be anything our side ? I'm not aware of any proxying going on for the RelEng network, but maybe. Otherwise can we reach out to edgecast ?
Flags: needinfo?(rail)
(In reply to Nick Thomas [:nthomas] from comment #21)
> 
> It's a request that is served by edgecast,
> http://download.cdn.mozilla.net/pub/firefox/releases/24.3.0esr/update/linux-
> x86_64/ca/firefox-24.3.0esr.complete.mar
> returning the wrong size. Repeating this on the same machine
> (bld-centos6-hp-017.build.scl1.m.c) gets me results that change from request
> to request. 

i think i may have found something after reading through edgecast's purge documents. it turns out that you need to append your purge request with a /* for a recursively purge (that's obvious, right?). anyway, i just submitted a new purge for the following:

  /pub/firefox/releases/24.3.0esr/*


this /should/ have the edgecast bit sorted out. lets test again first this in the morning to confirm, but i'm more confident in this purge submission. 


> Can this be anything our side ? I'm not aware of any proxying going on for
> the RelEng network, but maybe. 

nope, i'm certain this is not something on our end. the only touch point for our CDNs is the download-origin services which are fronted by zeus (i cleared all cached objects matching *24.3.0esr* this afternoon) and backed by the ftp cluster (no local caching).
i unfortunately cannot reproduce this directly through any of our CDNs. the examples below, expect a content length of 28131355 per your previous test...

akamai:
$ HOST='download-akamai.cdn.mozilla.net'; curl -sI http://$HOST/pub/firefox/releases/24.3.0esr/update/win32/zh-TW/firefox-24.3.0esr.complete.mar | grep Length
Content-Length: 28131355

highwinds:
$ HOST='download-highwinds.cdn.mozilla.net'; curl -sI http://$HOST/pub/firefox/releases/24.3.0esr/update/win32/zh-TW/firefox-24.3.0esr.complete.mar | grep Length
Content-Length: 28131355

edgecast:
$ HOST='wpc.1237.edgecastcdn.net/801237/download.cdn.mozilla.net'; curl -sI http://$HOST/pub/firefox/releases/24.3.0esr/update/win32/zh-TW/firefox-24.3.0esr.complete.mar | grep Length
Content-Length: 28131355

zeus:
$ HOST='download-origin.cdn.mozilla.net'; curl -sI http://$HOST/pub/firefox/releases/24.3.0esr/update/win32/zh-TW/firefox-24.3.0esr.complete.mar | grep Length
Content-Length: 28131355


to be able to investigate this further, we're going to need specific IP address(es) for where these results are coming back from.
Flags: needinfo?(rail)
Depends on: 967650
Attached file log.gz
Looks like all of the IPs with failures are Akamai. :/
Flags: needinfo?(rail)
i've opened a support base with akamai, since there's nothing more than purging (and we've done that a number of times now) we have access to doing. will report back when i have more information to provide.
Thanks
Attached file log2.gz
Another one...
:rail i got the following response from akamai on this issue. they suggest not using HEAD requests to validate content like we've been doing. do you think we can update the tests to make GET requests instead for these content-length validations?

"""
I think I know what's going on now. I just requested that object and got the correct content length. Here is the problem: Your 'curls' are being made with a HEAD request. One of the parameters that our Edges use to consider if an object is different is the request method. In this instance, you will get different results if you do a HEAD and a GET. Our Edge handles them as two different objects and more importantly it won't revalidate against the origin as it's being requested only headers. What I would suggest you is to change the way you check the content in our platform. Instead of sending an "-I" flag, do something like:

curl -D - -s -o /dev/null http://download.cdn.mozilla.net/pub/firefox/releases/24.3.0esr/update/mac/zh-TW/firefox-24.3.0esr.complete.mar

Our Edges react different to GET and HEAD requests. Try running the above request against the same Edges that failed (you can spoof your hosts file for that), but send a GET. You should get the proper size in response. I was careful to run another request with a HEAD and I can reproduce it now. But it looks like your origin, at least in this instance, replied to the Edge with that Content Length on January 30th at 00:45 GMT (this one should be 2420851 bytes):

HTTP Headers were swapped in:

HTTP/1.1 200 OK
Server: Apache
X-Backend-Server: ftp6.dmz.scl3.mozilla.com
Cache-Control: max-age=604800
Content-Type: application/octet-stream
Date: Thu, 30 Jan 2014 00:45:15 GMT
Expires: Thu, 06 Feb 2014 00:45:15 GMT
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
ETag: "42484e6-254b5b-4f11349857bc0"
Last-Modified: Wed, 29 Jan 2014 03:10:47 GMT
Content-Length: 2444123
X-Cache-Info: cached
Age: 0
X-Akamai-Purge-Seq-Num: 4721689
Connection: keep-alive

This doesn't mean that the file was wrong, just that the HEAD response we got said that.
"""
Flags: needinfo?(rail)
We used to do that, but it's a lot of traffic and we use HEAD as an optimization to speed up the release automation. Could you ask if a GET using a short Range request is treated the same as a GET for the full file ?
Flags: needinfo?(rail)
Or, you know, they could remove matching objects in both caches when people make purge requests. :-P
From http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html:

--------------

9.4 HEAD

The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request.

--------------

The HTTP specification is quite clear here: if the GET and HEAD requests are returning HTTP headers, it means the service is violating the terms of the HTTP protocol. If possible, I think this should be fixed on the (faulty) CDN side, rather than implementing workarounds on the (correctly implemented) Mozilla side. =)
^^^^ if the GET and HEAD requests are returning *different* HTTP headers ^^^^
Rail, do you want me to add -H "Pragma: akamai-x-cache-on, akamai-x-cache-remote-on, akamai-x-check-cacheable, akamai-x-get-cache-key, akamai-x-get-extracted-values, akamai-x-get-nonces, akamai-x-get-ssl-client-session-id, akamai-x-get-true-cache-key, akamai-x-serial-no, akamai-x-get-request-id" permanently to the curl requests in our final verification scripts, as per Eric Gonzalez Mora's suggestion, or is that redundant now?

Thanks,
Pete
Flags: needinfo?(rail)
If we can't use HEAD for final verification I'd try Nick's idea in comment 30 and use GET and Range for cases when HEAD fails.
Flags: needinfo?(rail)
(In reply to Rail Aliiev [:rail] from comment #35)
> If we can't use HEAD for final verification I'd try Nick's idea in comment
> 30 and use GET and Range for cases when HEAD fails.

i agree, that was exactly what i was thinking.


(In reply to Nick Thomas [:nthomas] from comment #31)
> Or, you know, they could remove matching objects in both caches when people
> make purge requests. :-P

i will also make this suggestion, but obviously cannot make any promises about an action item there ;)
Semi-related - the purge in bug 968387 doesn't show up in sentry (for bouncer) because it's also doing HEAD requests against Akamai:

Log entry for Mozilla Main CDN - non-SSL [486] (http://download.cdn.mozilla.net/pub) at 2014-02-06 05:05:21 UTC

Note: a FAILED/404 result on a file which is not included in the mozilla-current
      module is okay if you are only rsyncing mozilla-current.

Checking mirror download.cdn.mozilla.net ...
download.cdn.mozilla.net.	300	IN	CNAME	2-01-2967-0010.cdx.cedexis.net.
2-01-2967-0010.cdx.cedexis.net.	9	IN	CNAME	wildcard.cdn.mozilla.net.edgesuite.net.
wildcard.cdn.mozilla.net.edgesuite.net.	5700	IN	CNAME	a1284.g.akamai.net.
a1284.g.akamai.net.	20	IN	A	184.51.0.41
a1284.g.akamai.net.	20	IN	A	184.51.0.48
....
[2014-02-05 21:05:19 -0800] /firefox/releases/28.0b1/win32-EUballot/sv-SE/Firefox%20Setup%2028.0b1.exe... okay.
[2014-02-05 21:05:19 -0800] /firefox/releases/28.0b1/linux-i686/zh-TW/firefox-28.0b1.tar.bz2... okay.
[2014-02-05 21:05:19 -0800] /firefox/releases/28.0b1/update/linux-i686/zh-TW/firefox-28.0b1.complete.mar... okay.
...
(In reply to Rail Aliiev [:rail] from comment #35)
> If we can't use HEAD for final verification I'd try Nick's idea in comment
> 30 and use GET and Range for cases when HEAD fails.

i received confirmation from Akamai that a byte range GET request will work. 

can i get feedback from the group if this specific bug can be marked as r/fixed? i think the only outstanding action item here is to add an additional check to the tests to confirm if the content-length is incorrect on a HEAD request, a second check is performed with a byte range GET request.
Flags: needinfo?(rail)
Yeah, I think it can be closed since there is nothing we can do to purge the HEAD objects.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Flags: needinfo?(rail)
Resolution: --- → FIXED
Filed bug 1100179 to modify our scripts to use GET's.
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: