Closed Bug 1160609 Opened 9 years ago Closed 9 years ago

Two blogposts from Mozilla Security blog are missing in the Planet feed

Categories

(Websites :: planet.mozilla.org, defect)

Production
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: olivergill.mail, Assigned: nmaul)

References

Details

"https://blog.mozilla.org/security/2015/04/02/distrusting-new-cnnic-certificates/" is the last blog post included in the Planet feed.

I can't find the two latest posts from "Mozilla Security" blog in the Planet feed:
- "https://blog.mozilla.org/security/2015/04/27/removing-e-guven-ca-certificate/" (2014-04-27)
- "https://blog.mozilla.org/security/2015/04/30/deprecating-non-secure-http/"  (2014-04-30)
Flags: needinfo?(mhoye)
Planet says it's getting a 403 on that feed, which doesn't make a ton of sense.

Red rover, red rover, I call ops over.
Flags: needinfo?(mhoye) → needinfo?(achavez)
OK, it looks like Planet cannot see feeds from a number of mozilla-hosted blogs. Security, Community, several others.

We have a few upcoming announcements for which this functionality is quite important - coming up Wednesday, afaik - so I'd like to escalate this.
Status: UNCONFIRMED → NEW
Ever confirmed: true
The fetching is done on staticadm.private.phx1.mozilla.com, as part of /usr/local/bin/planet.sh

What it ends up doing for this chunk is:
cd /data/static/build/planet
/usr/bin/svn up -q
cd branches
cd planet
/usr/bin/python2.6 ../../trunk/planet.py config.ini

I added the -v to that planet.py call and got, for http://blog.mozilla.org/community/feed/ as an example (it does seem that all blog.mozilla.org URLs get a 403 response):

INFO:planet.runner:Fetching http://ops.mozilla-community.org/category/planet-mozilla/feed/ via 3
INFO:planet.runner:Fetching http://blog.mozilla.org/community/feed/ via 3
ERROR:planet.runner:Error 403 while updating feed http://blog.mozilla.org/community/feed/

I can fetch that URL with wget on staticadm. Trying to find the actual call method to debug further.
Other http requests from staticadm appear to the world as being from 63.245.216.227
http fetching code is in /data/static/build/planet/trunk/planet/spider.py
Blocks: 764986
A tcpdump of the conversation between the planet scripts and blog.mozilla.org shows lots of https traffic when, as I understand it, this should all be being fetched with http.
That's expected; about a sixth of the feeds we pull for Planet are HTTPS.
The URL is http:// the traffic is https:// that's not expected.
Cert issue?

[root@staticadm.private.phx1 planet]# wget -v https://blog.mozilla.org/ejpbruel/feed/
--2015-05-05 08:56:32--  https://blog.mozilla.org/ejpbruel/feed/
Resolving blog.mozilla.org... 104.130.89.232
Connecting to blog.mozilla.org|104.130.89.232|:443... connected.
ERROR: no certificate subject alternative name matches
        requested host name “blog.mozilla.org”.
To connect to blog.mozilla.org insecurely, use ‘--no-check-certificate’.
Might be able to get this working in the short term by switching all the fetches for blog.mozilla.org to http.
That may solve some of the problems, but http://blog.mozilla.org/community/feed/ is the one I'm specifically worried about, and it's not https.
I can see the request for http://blog.mozilla.org/community/feed/ in the packet dump I have, it looks valid and is definitely getting a 403 response. This needs to be looked at on the wpengine side to see why.
No.     Time        Source                Destination           Protocol Length Info
    668 40.509362   10.8.75.75            104.130.89.232        HTTP     190    GET /community/feed/ HTTP/1.1 

Frame 668: 190 bytes on wire (1520 bits), 190 bytes captured (1520 bits)
Ethernet II, Src: Vmware_94:68:8d (00:50:56:94:68:8d), Dst: Netscreen_ff:10:00 (00:10:db:ff:10:00)
Internet Protocol Version 4, Src: 10.8.75.75 (10.8.75.75), Dst: 104.130.89.232 (104.130.89.232)
Transmission Control Protocol, Src Port: 37708 (37708), Dst Port: 80 (80), Seq: 1, Ack: 1, Len: 124
    Source Port: 37708 (37708)
    Destination Port: 80 (80)
    [Stream index: 36]
    [TCP Segment Len: 124]
    Sequence number: 1    (relative sequence number)
    [Next sequence number: 125    (relative sequence number)]
    Acknowledgment number: 1    (relative ack number)
    Header Length: 32 bytes
    .... 0000 0001 1000 = Flags: 0x018 (PSH, ACK)
    Window size value: 115
    [Calculated window size: 14720]
    [Window size scaling factor: 128]
    Checksum: 0x1860 [validation disabled]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
    [SEQ/ACK analysis]
Hypertext Transfer Protocol
    GET /community/feed/ HTTP/1.1\r\n
        [Expert Info (Chat/Sequence): GET /community/feed/ HTTP/1.1\r\n]
        Request Method: GET
        Request URI: /community/feed/
        Request Version: HTTP/1.1
    Host: blog.mozilla.org\r\n
    accept-encoding: deflate, gzip\r\n
    user-agent: Python-httplib2/$Rev$\r\n
    \r\n
    [Full request URI: http://blog.mozilla.org/community/feed/]
    [HTTP request 1/1]
    [Response in frame: 670]
No.     Time        Source                Destination           Protocol Length Info
    670 40.564501   104.130.89.232        10.8.75.75            HTTP     473    HTTP/1.1 403 Forbidden  (text/html)

Frame 670: 473 bytes on wire (3784 bits), 473 bytes captured (3784 bits)
Ethernet II, Src: Netscreen_ff:10:00 (00:10:db:ff:10:00), Dst: Vmware_94:68:8d (00:50:56:94:68:8d)
Internet Protocol Version 4, Src: 104.130.89.232 (104.130.89.232), Dst: 10.8.75.75 (10.8.75.75)
Transmission Control Protocol, Src Port: 80 (80), Dst Port: 37708 (37708), Seq: 1, Ack: 125, Len: 407
    Source Port: 80 (80)
    Destination Port: 37708 (37708)
    [Stream index: 36]
    [TCP Segment Len: 407]
    Sequence number: 1    (relative sequence number)
    [Next sequence number: 408    (relative sequence number)]
    Acknowledgment number: 125    (relative ack number)
    Header Length: 32 bytes
    .... 0000 0001 1000 = Flags: 0x018 (PSH, ACK)
    Window size value: 29
    [Calculated window size: 14848]
    [Window size scaling factor: 512]
    Checksum: 0x1ba0 [validation disabled]
    Urgent pointer: 0
    Options: (12 bytes), No-Operation (NOP), No-Operation (NOP), Timestamps
    [SEQ/ACK analysis]
Hypertext Transfer Protocol
    HTTP/1.1 403 Forbidden\r\n
        [Expert Info (Chat/Sequence): HTTP/1.1 403 Forbidden\r\n]
        Request Version: HTTP/1.1
        Status Code: 403
        Response Phrase: Forbidden
    Server: nginx\r\n
    Content-Type: text/html\r\n
    Date: Tue, 05 May 2015 15:35:42 GMT\r\n
    Keep-Alive: timeout=20\r\n
    Connection: keep-alive\r\n
    Set-Cookie: X-Mapping-fjhppofk=417762F7E8CECAB0A67D3545510787EC; path=/\r\n
    Content-Length: 162\r\n
    \r\n
    [HTTP response 1/1]
    [Time since request: 0.055139000 seconds]
    [Request in frame: 668]
Line-based text data: text/html
Can you try sending a wget from that machine and spoof the useragent to Python-httplib2?
Good call. Looks like anything starting with Python- fails:

[root@staticadm.private.phx1 pradcliffe]# wget -v --user-agent='Python-' http://blog.mozilla.org/community/feed/ 
--2015-05-05 09:42:56--  http://blog.mozilla.org/community/feed/
Resolving blog.mozilla.org... 104.130.89.232
Connecting to blog.mozilla.org|104.130.89.232|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2015-05-05 09:42:56 ERROR 403: Forbidden.

[root@staticadm.private.phx1 pradcliffe]# wget -v --user-agent='blibble' http://  log.mozilla.org/community/feed/ 
--2015-05-05 09:43:04--  http://blog.mozilla.org/community/feed/
Resolving blog.mozilla.org... 104.130.89.232
Connecting to blog.mozilla.org|104.130.89.232|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/xml]
Saving to: “index.html”

    [ <=>                                   ] 82,007       452K/s   in 0.2s    

2015-05-05 09:43:04 (452 KB/s) - “index.html” saved [82007]
Well, now I guess the question is what to do about that. Thanks, Pir!
(In reply to Mike Hoye [:mhoye] from comment #17)
> Well, now I guess the question is what to do about that. Thanks, Pir!

I'm contacting WPEngine support about this and will make sure this is resolved. I'll keep you updated on the same.
Ticket #381385 opened with WPEngine. mhoye and the moc have been CC'ed.
Flags: needinfo?(achavez)
This is fixed. We patched the planet code to send a customer User-Agent ("venus" if anyone's wondering, from a similar patch that someone had made to a fork of the planet codebase).

As it happens WPEngine also changed stuff to allow "Python" in the UA, so our work was only maybe 15 minutes faster.
Assignee: nobody → nmaul
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 1154096
You need to log in before you can comment on or make changes to this bug.