Closed Bug 1122247 Opened 9 years ago Closed 9 years ago

Hacks blog sometimes return cached feed

Categories

(Infrastructure & Operations :: IT-Managed Tools, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: andershol, Assigned: Atoll)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/744] )

If you shift+reload the view-source:https://hacks.mozilla.org/feed/ in a browser it seems that you get one of two different xml-files. Which file you get seem to be correlated with the contents of the "X-Backend-Server"-http-header:

generic1.webapp.phx1.mozilla.com,
generic2.webapp.phx1.mozilla.com,
generic4.webapp.phx1.mozilla.com:
The lastBuildDate-field contains "Mon, 12 Jan 2015 19:23:21 +0000", there is no "WP-Super-Cache"-http-header and latest item is "2014: Mozilla Hacks looks back" with pubDate "Mon, 29 Dec 2014 15:00:27 +0000".

generic3.webapp.phx1.mozilla.com,
generic5.webapp.phx1.mozilla.com,
generic6.webapp.phx1.mozilla.com:
The lastBuildDate-field contains "Wed, 26 Nov 2014 16:42:39 +0000", the "WP-Super-Cache"-http-header contains "Served legacy cache file", and the latest item is "Save the Web – Be a Ford-Mozilla Open Web Fellow" with pubDate "Wed, 26 Nov 2014 16:42:39 +0000".

So it seems that there are some weird caching going on. It might be random what server returns what content, but this is what I observed.

I suspect the problem is related to the migration of the hacks blog to b.m.o https://wiki.mozilla.org/HacksPostMigrationUserDetails based on the dates in the feed and the approximate time the problem seem to have started.
Blocks: 1110273
Doesn't seem to be a problem anymore.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Running a shell script like this for a few minuets:

while true
do
  wget -qSO- https://hacks.mozilla.org/feed/ 1>temp.txt 2>&1
  n=`grep "X-Backend-Server" temp.txt | head -1 | cut -d: -f2 | cut -d. -f1 | sed -e 's/ //g'`
  s=`md5sum temp.txt | cut -c1-32`
  echo `date +"%Y-%m-%d_%H-%M-%S"` feed-$n-$s.txt >> log.txt
  mv temp.txt feed-$n-$s.txt
  sleep 5
done

And summerizing gives:

$ grep lastBuildDate * | sed -e 's/-/ /g' | sed -e 's/\t/ /g' | cut -d' ' -f2,4- | sort -u
generic1 <lastBuildDate>Wed, 11 Mar 2015 15:56:19 +0000</lastBuildDate>
generic2 <lastBuildDate>Wed, 11 Mar 2015 15:56:19 +0000</lastBuildDate>
generic3 <lastBuildDate>Wed, 26 Nov 2014 16:42:39 +0000</lastBuildDate>
generic4 <lastBuildDate>Wed, 11 Mar 2015 15:56:19 +0000</lastBuildDate>
generic5 <lastBuildDate>Wed, 26 Nov 2014 16:42:39 +0000</lastBuildDate>
generic6 <lastBuildDate>Wed, 26 Nov 2014 16:42:39 +0000</lastBuildDate>

$ grep '<generator>' * | sed -e 's/-/ /g' | sed -e 's/\t/ /g' | cut -d' ' -f2,4- | sort -u
generic1 <generator>http://wordpress.org/?v=4.1</generator>
generic2 <generator>http://wordpress.org/?v=4.1</generator>
generic3 <generator>http://wordpress.org/?v=4.0.1</generator>
generic4 <generator>http://wordpress.org/?v=4.1</generator>
generic5 <generator>http://wordpress.org/?v=4.0.1</generator>
generic6 <generator>http://wordpress.org/?v=4.0.1</generator>
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
Moving this bug to webops and adding Sean Rich.

Sean, this like the same feed cache issues we had discussed via email. Looks like a few webheads still have really old caches?
Assignee: nobody → server-ops-webops
Component: blog.mozilla.org → WebOps: IT-Managed Tools
Product: Websites → Infrastructure & Operations
QA Contact: smani
Version: unspecified → other
Craig,

We'll take a look at this.
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/744]
I adapted the wget for internal use and verified this issue, exposing an additional detail:

$ for i in generic{1..6}.webapp.phx1.mozilla.com; do wget --header="Host: hacks.mozilla.org" -qSO- http://${i}:81/feed/ 2>&1 | egrep -i '(x-backend|last-modified|etag:|super.?cache)'; echo; done
  X-Backend-Server: generic1.webapp.phx1.mozilla.com
  Last-Modified: Wed, 11 Mar 2015 15:56:19 GMT
  ETag: "811507e5ecf9179b0f673acf6bcbf181"

  X-Backend-Server: generic2.webapp.phx1.mozilla.com
  Last-Modified: Wed, 11 Mar 2015 15:56:19 GMT
  ETag: "811507e5ecf9179b0f673acf6bcbf181"

  X-Backend-Server: generic3.webapp.phx1.mozilla.com
  ETag: "6fc1c553cb8f9d059f94dd514ae127f1"
  WP-Super-Cache: Served legacy cache file
<!-- Cached page generated by WP-Super-Cache on 2014-11-27 01:04:43 -->

  X-Backend-Server: generic4.webapp.phx1.mozilla.com
  Last-Modified: Wed, 11 Mar 2015 15:56:19 GMT
  ETag: "811507e5ecf9179b0f673acf6bcbf181"

  X-Backend-Server: generic5.webapp.phx1.mozilla.com
  ETag: "6fc1c553cb8f9d059f94dd514ae127f1"
  WP-Super-Cache: Served legacy cache file
<!-- Cached page generated by WP-Super-Cache on 2014-11-27 00:47:17 -->

  X-Backend-Server: generic6.webapp.phx1.mozilla.com
  ETag: "6fc1c553cb8f9d059f94dd514ae127f1"
  WP-Super-Cache: Served legacy cache file
<!-- Cached page generated by WP-Super-Cache on 2014-11-27 00:52:48 -->

[1] https://wordpress.org/plugins/wp-super-cache/faq/

> Legacy cached files will have the header, "WP-Super-Cache: Served legacy cache file".

[2] http://z9.io/wp-super-cache-developers/

> The plugin operates in three modes. Mod_rewrite, PHP and LEGACY. In the first two modes static cache files are created in the supercache cache folder

[3] https://github.com/Automattic/wp-super-cache

> 3. Legacy caching. This is mainly used to cache pages for known users. These are logged in users, visitors who leave comments or those who should be shown custom per-user data.

> If you're new to caching use PHP caching. It's easy to set up and very fast. Avoid legacy caching if you can.

If it should always be the same for *all* users, regardless of admin-or-not, logged-in-or-not, and so forth, then the use of 'legacy' caching mode here is almost certainly incorrect.

So, to ask a rather odd question:

Should the hacks.mozilla.org RSS feed be the same for *all* users, regardless of whether they're logged in or not?

I'm not sure who to ask, so setting a couple needinfo?'s here.
Flags: needinfo?(craigcook.bugz)
Flags: needinfo?(andershol)
The legacy cache header was mentioned in comment 0, but not it's implications of course. Good catch.

A rss-reader will never (dangerous word to use) be logged in. So in case you are logged in and is looking at the rss feed (e.g. for debugging) you really want to see what it would look like if you weren't locked in. So it should always be the same, and it should always be rendered as it would be for a not-locked in user. So I think you want to use the "rewrite" method.
Flags: needinfo?(andershol)
But super-cache won't only be used for the rss-feed. So the decision will affect all pages. But since the hacks site (as far as I can see) isn't a site where normal users are supposed to sign up, normally very few people (the editors) will use the site while signed in. So it will still makes sense only to cache what the non-signed-in users see. So "rewrite" should still be used (especially because the speed benefit of this method is great, I believe).
Note that the fluctuation between a very old feed and the current feed, may currently cause the changed posts to regularly be added again to the rss-reader of subscribers (or subscribers to planet). So if you could make a quick-fix (e.g. clear the cache) to get the very old feeds out of circulation it would be great.
If a blog published private posts, I could see that as a valid reason for the feed to be different for logged-in users. Hacks doesn't have private posts so for this blog I would expect to see the same content whether I'm logged in or not.

For what it's worth, WP Super Cache is currently set to PHP, but perhaps it was set to Legacy once upon a time when that stale cache was generated?
Flags: needinfo?(craigcook.bugz)
Assignee: server-ops-webops → rsoderberg
On each of generic3,5,6, I removed the stale cache file and its metadata file from disk:

In /data/www/blog.mozilla.org/wp-content/cache/blogs/ --

.../feed/meta/wp-cache-feed303ebe4a7751ef9b055510efab531d1a.meta
.../feed/wp-cache-feed303ebe4a7751ef9b055510efab531d1a.html

And now all 7 servers check out cleanly:

$ for i in generic{1..7}.webapp.phx1.mozilla.com; do wget --header="Host: hacks.mozilla.org" -qSO- http://${i}:81/feed/ 2>&1 | egrep -i '(x-backend|last-modified|etag:|super.?cache)'; echo; done | grep ETag | sort | uniq -c

   7   ETag: "a67edd4fc874e9db43e89ec5a0c128ed"

Let us know if you come across any further issues here - sorry for the delay!
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
The problem is probably solved now. But there still seems to be some strangeness in webservers, not sure if it is relevant for your. Running this for the frontpage (which doesn't suppress the cache comments, as the rss-feed does) :

while true
do
  wget -qSO- https://hacks.mozilla.org/ 1>temp.txt 2>&1
  n=`grep "X-Backend-Server" temp.txt | head -1 | cut -d: -f2 | cut -d. -f1 | sed -e 's/ //g'`
  s=`md5sum temp.txt | cut -c1-32`
  echo `date +"%Y-%m-%d_%H-%M-%S"` feed-$n-$s.txt >> log.txt
  mv temp.txt feed-$n-$s.txt
  sleep 5
done

... and greping:

$ grep '<!--.*Super' *.txt | sed -e 's/^\S*-\(\S*\)-[a-z0-9]*\.txt:\(.*\)/\1 \2/' | sort -u
generic1 <!-- Cached page generated by WP-Super-Cache on 2015-03-31 03:05:16 -->
generic1 <!-- Cached page generated by WP-Super-Cache on 2015-03-31 03:38:19 -->
generic2 <!-- Cached page generated by WP-Super-Cache on 2015-03-31 03:10:11 -->
generic3 <!-- Cached page generated by WP-Super-Cache on 2015-03-31 03:23:34 -->
generic4 <!-- Cached page generated by WP-Super-Cache on 2015-03-31 03:33:07 -->
generic5 <!-- Cached page generated by WP-Super-Cache on 2015-03-31 03:27:07 -->
generic6 <!-- Cached page generated by WP-Super-Cache on 2015-03-31 03:40:11 -->
generic7 <!-- File not cached! Super Cache Couldn't write to: /data/www/blog.mozilla.org/wp-content/cache/supercache/hacks.mozilla.org/647789608551a78dce507b2.96859406.tmp -->
generic7 <!-- File not cached! Super Cache Couldn't write to: /data/www/blog.mozilla.org/wp-content/cache/supercache/hacks.mozilla.org/667605288551a7767e6da71.82252770.tmp -->
generic7 <!-- File not cached! Super Cache Couldn't write to: /data/www/blog.mozilla.org/wp-content/cache/supercache/hacks.mozilla.org/799521471551a789dee9b78.98749755.tmp -->

Seem to show a problem on generic7.
It does. I'll open another bug, cc'ing you, since that isn't this issue.
You need to log in before you can comment on or make changes to this bug.