Closed Bug 710449 Opened 14 years ago Closed 14 years ago

Latency spikes between brasstacks.mozilla.com and buildbot-es.metrics.sjc1.mozilla.com:9200

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

x86_64
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jgriffin, Unassigned)

Details

We are experiencing occasional large lag spikes for HTTP requests from brasstacks.mozilla.com to ElasticSearch on buildbot-es.metrics.sjc1.mozilla.com:9200. This manifests as very slow loading times for the OrangeFactor website: http://brasstacks.mozilla.com/orangefactor/?display=&tree=all. This problem does not appear to be caused by ES, as performing the same ES requests on my machine is fast, even at the same time as when the requests made from brasstacks are very slow. The difference is significant: say 6s for a request from my machine, vs 2 or 3 minutes for the same request made from brasstacks during the latency spikes. Brasstacks itself is not CPU or RAM-bound at the time of the latency, so it doesn't appear that brasstacks itself is the cause of the problem.
We'd like to be able to perform some sampling when the latency is actually occurring. Do you have a feel for how often you encounter the problem? Do you have timestamps for a specific event in the past? We're also curious if the latency occurs while opening the actual connection, or while waiting for a response to the HTTP request. A 2-3 minute response time generally exceeds the TCP connection timeout. Also, where is your machine actually located (for comparison)? Are you connected over VPN when running the tests against ES?
Luckily, I just managed to catch the reported behavior at 16:40 PST. It looks like it is related to slow DNS resolution. Jonathan, can you confirm that brasstacks is configured to connect to buildbot-es.metrics.sjc1.mozilla.com by name and not by IP?
Issue resolved. buildbot-es.metrics.sjc1.mozilla.com is actually a DNS round-robin record pointing at three different servers: 10.2.72.53 10.2.72.54 10.2.72.55 Access from sm-brasstacks was only granted to 10.2.72.53. Roughly two out of every three HTTP requests would fail as they tried to connect to inaccessible servers. The necessary permissions have now been granted.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
(In reply to Derek Moore from comment #2) > Luckily, I just managed to catch the reported behavior at 16:40 PST. It > looks like it is related to slow DNS resolution. > > Jonathan, can you confirm that brasstacks is configured to connect to > buildbot-es.metrics.sjc1.mozilla.com by name and not by IP? Yes, we connect to buildbot-es using name and not IP. Thanks for the fix!
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.