Closed Bug 1051978 Opened 10 years ago Closed 10 years ago

Network connectivity issue with addons.mozilla.org

Categories

(Cloud Services :: Operations: Marketplace, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jik, Unassigned)

References

Details

I have a script which runs every hour and checks the contents of a particular page on addons.mozilla.org (https://addons.mozilla.org/en-US/thunderbird/addon/send-later-3/versions/) to confirm that they are what I expect.

Several weeks ago, this script began regularly ending up downloading only part of the page around half the time it runs.

I reproduced this issue easily with this command:

while true; do curl -m 10 --silent https://addons.mozilla.org/en-US/thunderbird/addon/send-later-3/versions/ | wc -c; sleep 5; done

When I run this command from an AWS EC2 us-east instance, it ends up getting a truncated version of the page about half the time:

$ while true; do curl -m 10 --silent https://addons.mozilla.org/en-US/thunderbird/addon/send-later-3/versions/ | wc -c; sleep 5; done
102723
102723
102723
19394
12133
0
102723
19394
12133
19394
0
0
102723
102723
19394
102723
12133
26653
3941
0
12133

I see similar results when I run the command from a Digital Ocean droplet in their New York data center, which is where I first noticed the problem.

However, when I run exactly the same command from my home computer, connected via RCN in Boston), every single download successfully fetches the whole page.

When I run "mtr" from all of these locations to addons.mozilla.org, I see only negligible packet loss, but I start to see more significant packet loss from EC2 and Digital Ocean when I add "-s 1000" to the mtr command, suggesting that there is an issue with larger packets getting lost.

Here's are the last few hops of the MTR output from my home machine, which has no packet loss:

10. ae6.mpr4.phx2.us.above.net                          0.0%   102   70.1  74.4  69.1 128.9  10.1
11. 64.124.201.182.mozilla.com                          0.0%   102   69.2  73.9  69.2 103.5   6.8
12. v-1058.core1.market.phx1.mozilla.net                0.0%   102   73.0  73.1  69.6  88.9   2.9
13. addons-mozilla-org.mktns.services.phx1.mozilla.net  0.0%   102   74.8  76.4  73.1 102.6   4.3

And here's EC2:

17. ae6.mpr4.phx2.us.above.net                          0.0%   162   79.8  88.0  79.6 161.7  13.4
18. 64.124.201.182.mozilla.com                          1.9%   162   67.7  76.4  66.8 141.0  13.5
19. v-1058.core1.market.phx1.mozilla.net                2.5%   162   73.9  77.7  58.1 175.9  17.1
20. addons-mozilla-org.mktns.services.phx1.mozilla.net  0.0%   162   83.8  86.5  80.0 206.9  13.1

And here's Digital Ocean:

 7. mozilla-ic-140268-phx-b1.c.telia.net                0.0%   201  178.3  79.5  75.1 178.3  15.4
 8. v-1057.core1.market.phx1.mozilla.net                1.0%   201   76.6  76.8  76.4  81.9   0.4
 9. addons-mozilla-org.mktns.services.phx1.mozilla.net 29.5%   201   75.3  75.4  75.0  88.6   1.1

Any insight into what's going on here?
Status: NEW → UNCONFIRMED
Ever confirmed: false
Assignee: nobody → server-ops-amo
Component: Administration → Server Operations: AMO Operations
Product: addons.mozilla.org → mozilla.org
QA Contact: oremj
Version: unspecified → other
I've opened a trouble ticket with Telia to see if there is anything they can do. 

Telia trouble ticket number:  00362376
Depends on: 1052120
Early reports were that it was traffic transiting Zayo that had problems.
Zayo was experiencing a fiber cut, adding congestion to their network.
Netops chose to shut down our 2 connections to Zayo in PHX1 to try to 
stop that problem.  It did not fix the problem.

We also have 2 connections to Telia in PHX1.
James Barnell spoke with Telia, who told us they are seeing a high number of
errors on one of our interfaces.  (thanks for telling us Telia!) 
Our side of the connection is clean.  

I spoke with Zayo this morning who confirmed their fiber cut repair was completed
this morning at 8:30am PST.  So I have restored the connections to Zayo.

Now, we will shut down the suspect link to Telia, to see if that improves the
situation.
I have contacted Zayo and had them close their ticket 513000.

They provided me with the interface stats on their side.
One interface is completely clean, the other has ~1k CRC errors, but those errors are 
not accumulating.

Zayo was able to 
a) confirm that each link goes to a different router
b) verify that the CID I have on our operations map is correct

Here are the interface stats:

re1.mpr4.phx2.us> show interfaces xe-1/3/0 extensive | match
"clear|flapped|link|neg|errors|description"
  Physical interface: xe-1/3/0, Enabled, Physical link is Up
  Description: 21439 Mozilla Corporation @ AZ-N48 via 21439-G002-000[12]
  Link-level type: Ethernet, MTU: 1514, LAN-PHY mode, Speed: 10Gbps, BPDU
Error: None, Loopback: None, Source filtering: Disabled, Flow control:
Enabled
  Link flags     : None
  Last flapped   : 2014-02-07 07:03:47 UTC (26w4d 09:01 ago)
  Statistics last cleared: Never
  Input errors:
    Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Policed discards: 0,
L3 incompletes: 0, L2 channel errors: 0, L2 mismatch timeouts: 0, FIFO
errors: 0, Resource errors: 0
  Output errors:
    Carrier transitions: 1, Errors: 0, Drops: 0, Collisions: 0, Aged
packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 11, Resource
errors: 0
    Bit errors                             0
    CRC/Align errors                         0                0
    FIFO errors                              0                0
    Description: T00539-00 Mozilla Corporation

re1.mpr4.phx2.us> show interfaces xe-1/3/0 extensive

Physical interface: xe-1/3/0, Enabled, Physical link is Up
  Interface index: 172, SNMP ifIndex: 567, Generation: 175
  Description: 21439 Mozilla Corporation @ AZ-N48 via 21439-G002-000[12]
  Link-level type: Ethernet, MTU: 1514, LAN-PHY mode, Speed: 10Gbps, BPDU
Error: None, Loopback: None, Source filtering: Disabled, Flow control:
Enabled
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 8 supported, 8 maximum usable queues
  Hold-times     : Up 0 ms, Down 0 ms
  Current address: 00:19:e2:bd:39:20, Hardware address: 00:19:e2:bd:39:20
  Last flapped   : 2014-02-07 07:03:47 UTC (26w4d 09:01 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :      723656632107274            459614000 bps
   Output bytes  :      174260623847302             83808544 bps
   Input  packets:         758282226101                63591 pps
   Output packets:        1214487419638                72569 pps
   IPv6 transit statistics:
    Input  bytes  :      13976504089439
    Output bytes  :       2198258870553
    Input  packets:         20246432288
    Output packets:         22541887931
  Dropped traffic statistics due to STP State:
   Input  bytes  :                    0
   Output bytes  :                    0
   Input  packets:                    0
   Output packets:                    0
  Input errors:
    Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Policed discards: 0,
L3 incompletes: 0, L2 channel errors: 0, L2 mismatch timeouts: 0, FIFO
errors: 0, Resource errors: 0
  Output errors:
    Carrier transitions: 1, Errors: 0, Drops: 0, Collisions: 0, Aged
packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 11, Resource
errors: 0
  Egress queues: 8 supported, 8 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped
packets
    0 Best_Effort        1214153185062        1214153185062
   0
    1 Resv_Low                       0                    0
   0
    2 Resv_High                      0                    0
   0
    3 Network_Cont           334010633            334010633
   0
    4 Basic_VPN                      0                    0
   0
    5 Enhanced_VPN                   0                    0
   0
    6 Preferred_VP                   0                    0
   0
    7 Critical_VPN                  39                   39
   0
  Queue number:         Mapped forwarding classes
    0                   Best_Effort
    1                   Resv_Low
    2                   Resv_High
    3                   Network_Control
    4                   Basic_VPN
    5                   Enhanced_VPN
    6                   Preferred_VPN
    7                   Critical_VPN
  Active alarms  : None
  Active defects : None
  PCS statistics                      Seconds
    Bit errors                             0
    Errored blocks                         0
  MAC statistics:                      Receive         Transmit
    Total octets               738050831420804  200268113450911
    Total packets                 758282215684    1214487409546
    Unicast packets               758282208332    1214487384163
    Broadcast packets                     7339            11414
    Multicast packets                       13            13969
    CRC/Align errors                         0                0
    FIFO errors                              0                0
    MAC control frames                       0                0
    MAC pause frames                         0                0
    Oversized frames                         0
    Jabber frames                            0
    Fragment frames                          0
    VLAN tagged frames                       0
    Code violations                          0
  Filter statistics:
    Input packet count            758282215684
    Input packet rejects                     8
    Input DA rejects                         8
    Input SA rejects                         0
    Output packet count                           1214487409546
    Output packet pad count                                   0
    Output packet error count                                 0
    CAM destination filters: 0, CAM source filters: 0
  Packet Forwarding Engine configuration:
    Destination slot: 1
  CoS information:
    Direction : Output
    CoS transmit queue               Bandwidth               Buffer
Priority   Limit
                              %            bps     %           usec
    0 Best_Effort            95     9500000000    95              0
 low    none
    3 Network_Control         5      500000000     5              0
 low    none
  Interface transmit statistics: Disabled

  Logical interface xe-1/3/0.0 (Index 362) (SNMP ifIndex 656) (Generation
189)
    Description: T00539-00 Mozilla Corporation
    Flags: SNMP-Traps 0x4004000 Encapsulation: ENET2
    Traffic statistics:
     Input  bytes  :      723656632107398
     Output bytes  :      174260318995544
     Input  packets:         758282226103
     Output packets:        1214487419647
     IPv6 transit statistics:
      Input  bytes  :      13976201768686
      Output bytes  :       1099346865246
      Input  packets:         20242320797
      Output packets:         11273362763
    Local statistics:
     Input  bytes  :           1399693628
     Output bytes  :           5176715788
     Input  packets:             25003359
     Output packets:             25404478
    Transit statistics:
     Input  bytes  :      723655232413770            459611296 bps
     Output bytes  :      174255142279756             83806032 bps
     Input  packets:         758257222744                63585 pps
     Output packets:        1214462015169                72567 pps
     IPv6 transit statistics:
      Input  bytes  :      13976201768686
      Output bytes  :       1099346865246
      Input  packets:         20242320797
      Output packets:         11273362763
    Protocol inet, MTU: 1500, Generation: 249, Route table: 0
      Flags: Sendbcast-pkt-to-re, uRPF, uRPF-loose
      RPF Failures: Packets: 110298, Bytes: 11864104
      Input Filters: xe-1/3/0.0-i
      Addresses, Flags: Is-Preferred Is-Primary
        Destination: 64.124.201.180/30, Local: 64.124.201.181, Broadcast:
64.124.201.183, Generation: 195
    Protocol inet6, MTU: 1500, Generation: 250, Route table: 0
      Flags: uRPF, uRPF-loose
      RPF Failures: Packets: 0, Bytes: 0
      Input Filters: v6-edge-filter-xe-1/3/0.0-i
      Addresses, Flags: Is-Preferred Is-Primary
        Destination: 2001:438:fffe::ec/126, Local: 2001:438:fffe::ed
    Generation: 197
      Addresses, Flags: Is-Preferred
        Destination: fe80::/64, Local: fe80::219:e2ff:febd:3920
    Protocol multiservice, MTU: Unlimited, Generation: 199
    Generation: 251, Route table: 0
      Policer: Input: per-ifd-arp-xe-1/3/0.0-inet-arp
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
re1.mpr3.phx2.us> show interfaces xe-1/3/0 extensive | match
"clear|flapped|link|neg|errors|description"
  Physical interface: xe-1/3/0, Enabled, Physical link is Up
  Description: to T00539-00 Mozilla Corporation @ AZ-N48 via
21439-G001-000[12]
  Link-level type: Ethernet, MTU: 1514, LAN-PHY mode, Speed: 10Gbps, BPDU
Error: None, Loopback: None, Source filtering: Disabled, Flow control:
Enabled
  Link flags     : None
  Last flapped   : 2014-07-17 09:40:23 UTC (3w5d 06:25 ago)
  Statistics last cleared: 2014-04-01 13:39:30 UTC (19w0d 02:26 ago)
  Input errors:
    Errors: 1259, Drops: 0, Framing errors: 1259, Runts: 0, Policed
discards: 0, L3 incompletes: 0, L2 channel errors: 0, L2 mismatch timeouts:
0, FIFO errors: 0, Resource errors: 0
  Output errors:
    Carrier transitions: 10, Errors: 0, Drops: 0, Collisions: 0, Aged
packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 26, Resource
errors: 0
    Bit errors                             5
    CRC/Align errors                      1259                0
    FIFO errors                           1439                0
    Description: T00539-00 Mozilla Corporation

re1.mpr3.phx2.us> show interfaces xe-1/3/0 extensive

Physical interface: xe-1/3/0, Enabled, Physical link is Up
  Interface index: 209, SNMP ifIndex: 576, Generation: 356
  Description: to T00539-00 Mozilla Corporation @ AZ-N48 via
21439-G001-000[12]
  Link-level type: Ethernet, MTU: 1514, LAN-PHY mode, Speed: 10Gbps, BPDU
Error: None, Loopback: None, Source filtering: Disabled, Flow control:
Enabled
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 8 supported, 8 maximum usable queues
  Hold-times     : Up 0 ms, Down 0 ms
  Current address: 00:19:e2:bd:01:20, Hardware address: 00:19:e2:bd:01:20
  Last flapped   : 2014-07-17 09:40:23 UTC (3w5d 06:28 ago)
  Statistics last cleared: 2014-08-12 16:07:16 UTC (00:01:53 ago)
  Traffic statistics:
   Input  bytes  :           2904602160            203959048 bps
   Output bytes  :            299142044             19621096 bps
   Input  packets:              3074375                27381 pps
   Output packets:              2036360                18235 pps
   IPv6 transit statistics:
    Input  bytes  :           116706989
    Output bytes  :              884234
    Input  packets:               98287
    Output packets:               10540
  Dropped traffic statistics due to STP State:
   Input  bytes  :                    0
   Output bytes  :                    0
   Input  packets:                    0
   Output packets:                    0
  Input errors:
    Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Policed discards: 0,
L3 incompletes: 0, L2 channel errors: 0, L2 mismatch timeouts: 0, FIFO
errors: 0, Resource errors: 0
  Output errors:
    Carrier transitions: 0, Errors: 0, Drops: 0, Collisions: 0, Aged
packets: 0, FIFO errors: 0, HS link CRC errors: 0, MTU errors: 0, Resource
errors: 0
  Egress queues: 8 supported, 8 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped
packets
    0 Best_Effort              2070592              2070592
   0
    1 Resv_Low                       0                    0
   0
    2 Resv_High                      0                    0
   0
    3 Network_Cont                1564                 1564
   0
    4 Basic_VPN                      0                    0
   0
    5 Enhanced_VPN                   0                    0
   0
    6 Preferred_VP                   0                    0
   0
    7 Critical_VPN                   0                    0
   0
  Queue number:         Mapped forwarding classes
    0                   Best_Effort
    1                   Resv_Low
    2                   Resv_High
    3                   Network_Control
    4                   Basic_VPN
    5                   Enhanced_VPN
    6                   Preferred_VPN
    7                   Critical_VPN
  Active alarms  : None
  Active defects : None
  PCS statistics                      Seconds
    Bit errors                             0
    Errored blocks                         0
  MAC statistics:                      Receive         Transmit
    Total octets                    2958799646        342627326
    Total packets                      3070475          2034521
    Unicast packets                    3070475          2034521
    Broadcast packets                        0                0
    Multicast packets                        0                0
    CRC/Align errors                         0                0
    FIFO errors                              0                0
    MAC control frames                       0                0
    MAC pause frames                         0                0
    Oversized frames                         0
    Jabber frames                            0
    Fragment frames                          0
    VLAN tagged frames                       0
    Code violations                          0
  Filter statistics:
    Input packet count                 3070475
    Input packet rejects                     0
    Input DA rejects                         0
    Input SA rejects                         0
    Output packet count                                 2034521
    Output packet pad count                                   0
    Output packet error count                                 0
    CAM destination filters: 0, CAM source filters: 0
  Packet Forwarding Engine configuration:
    Destination slot: 1
  CoS information:
    Direction : Output
    CoS transmit queue               Bandwidth               Buffer
Priority   Limit
                              %            bps     %           usec
    0 Best_Effort            95     9500000000    95              0
 low    none
    3 Network_Control         5      500000000     5              0
 low    none
  Interface transmit statistics: Disabled

  Logical interface xe-1/3/0.0 (Index 196619) (SNMP ifIndex 577)
(Generation 348)
    Description: T00539-00 Mozilla Corporation
    Flags: SNMP-Traps 0x4004000 Encapsulation: ENET2
    Traffic statistics:
     Input  bytes  :           2904602108
     Output bytes  :            299139272
     Input  packets:              3074374
     Output packets:              2036360
     IPv6 transit statistics:
      Input  bytes  :           116703057
      Output bytes  :              884234
      Input  packets:               98233
      Output packets:               10540
    Local statistics:
     Input  bytes  :                13160
     Output bytes  :                37998
     Input  packets:                  230
     Output packets:                  231
    Transit statistics:
     Input  bytes  :           2904588948            203958840 bps
     Output bytes  :            299101274             19620416 bps
     Input  packets:              3074144                27381 pps
     Output packets:              2036129                18235 pps
     IPv6 transit statistics:
      Input  bytes  :           116703057
      Output bytes  :              884234
      Input  packets:               98233
      Output packets:               10540
    Protocol inet, MTU: 1500, Generation: 560, Route table: 0
      Flags: Sendbcast-pkt-to-re, uRPF, uRPF-loose
      RPF Failures: Packets: 0, Bytes: 0
      Input Filters: xe-1/3/0.0-i
      Addresses, Flags: Is-Preferred Is-Primary
        Destination: 64.124.201.176/30, Local: 64.124.201.177, Broadcast:
64.124.201.179, Generation: 399
    Protocol inet6, MTU: 1500, Generation: 561, Route table: 0
      Flags: uRPF, uRPF-loose
      RPF Failures: Packets: 0, Bytes: 0
      Input Filters: v6-edge-filter-xe-1/3/0.0-i
      Addresses, Flags: Is-Preferred Is-Primary
        Destination: 2001:438:fffe::e8/126, Local: 2001:438:fffe::e9
    Generation: 401
      Addresses, Flags: Is-Preferred
        Destination: fe80::/64, Local: fe80::219:e2ff:febd:120
    Protocol multiservice, MTU: Unlimited, Generation: 403
    Generation: 562, Route table: 0
      Policer: Input: per-ifd-arp-xe-1/3/0.0-inet-arp
--
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Server Operations: AMO Operations → Operations: Marketplace
Product: mozilla.org → Mozilla Services
You need to log in before you can comment on or make changes to this bug.