Closed Bug 917022 Opened 11 years ago Closed 11 years ago

MakeAPI server constantly crashing

Categories

(Webmaker Graveyard :: MakeAPI, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mjschranz, Assigned: jon)

References

Details

Attachments

(1 file)

Apparently Chris knows about this, but I basically can't save anything in Popcorn Maker. It just hangs and never winds up returning from the server. Dave seemed to be under the impression that it was due to the MakeAPI.
Depends on: 917080
Yes, things are definitely failing: localhost:Downloads jon$ elbh makeapi-production INSTANCE_ID i-29deef49 InService N/A N/A INSTANCE_ID i-2bdeef4b InService N/A N/A INSTANCE_ID i-320f4158 InService N/A N/A INSTANCE_ID i-2ddf934f InService N/A N/A INSTANCE_ID i-701a1113 InService N/A N/A INSTANCE_ID i-8c02cbf7 InService N/A N/A INSTANCE_ID i-8e02cbf5 InService N/A N/A INSTANCE_ID i-fa1f4391 InService N/A N/A INSTANCE_ID i-69af0913 InService N/A N/A INSTANCE_ID i-6faf0915 InService N/A N/A INSTANCE_ID i-9f4752f4 OutOfService Instance has failed at least the UnhealthyThreshold number of health checks consecutively. Instance INSTANCE_ID i-f1859b99 InService N/A N/A INSTANCE_ID i-599c9731 InService N/A N/A INSTANCE_ID i-5f9c9737 InService N/A N/A INSTANCE_ID i-b06378d1 OutOfService Instance has failed at least the UnhealthyThreshold number of health checks consecutively. Instance INSTANCE_ID i-b26378d3 OutOfService Instance has failed at least the UnhealthyThreshold number of health checks consecutively. Instance INSTANCE_ID i-107a8873 InService N/A N/A
I ran an instance in my terminal and got the following error after about a minute: {"name":"makeapi","hostname":"i-b06378d1","pid":3556,"level":60,"err":{"message":"connect EADDRNOTAVAIL","name":"Error","stack":"Error: connect EADDRNOTAVAIL\n at errnoException (net.js:884:11)\n at connect (net.js:747:19)\n at net.js:825:9\n at asyncCallback (dns.js:68:16)\n at Object.onanswer [as oncomplete] (dns.js:121:9)","code":"EADDRNOTAVAIL"},"msg":"connect EADDRNOTAVAIL","time":"2013-09-17T09:49:55.639Z","v":0} /var/www/makeapi/lib/logger.js:54 throw err; ^ Error: connect EADDRNOTAVAIL at errnoException (net.js:884:11) at connect (net.js:747:19) at net.js:825:9 at asyncCallback (dns.js:68:16) at Object.onanswer [as oncomplete] (dns.js:121:9) Curiously enough, the server appeared to not work, but it's as if the process was still running?
DATA While running the makeapi in my terminal, I also ran this bash script which outputs the number of open tcp/udp sockets: ubuntu@i-b06378d1:~$ while (true); do date; netstat -an | egrep -c 'tcp|udp'; sleep 5; done; Tue Sep 17 10:36:49 UTC 2013 1191 Tue Sep 17 10:36:54 UTC 2013 2781 Tue Sep 17 10:36:59 UTC 2013 4382 Tue Sep 17 10:37:04 UTC 2013 5795 Tue Sep 17 10:37:09 UTC 2013 7431 Tue Sep 17 10:37:15 UTC 2013 9024 Tue Sep 17 10:37:20 UTC 2013 10664 Tue Sep 17 10:37:25 UTC 2013 12026 Tue Sep 17 10:37:30 UTC 2013 13073 Tue Sep 17 10:37:36 UTC 2013 14633 Tue Sep 17 10:37:41 UTC 2013 15533 Tue Sep 17 10:37:46 UTC 2013 17217 Tue Sep 17 10:37:52 UTC 2013 18868 Tue Sep 17 10:37:57 UTC 2013 18181 Tue Sep 17 10:38:03 UTC 2013 19786 Tue Sep 17 10:38:08 UTC 2013 21247 Tue Sep 17 10:38:14 UTC 2013 20666 Tue Sep 17 10:38:19 UTC 2013 22377 Tue Sep 17 10:38:25 UTC 2013 22627 Tue Sep 17 10:38:30 UTC 2013 23357 Tue Sep 17 10:38:36 UTC 2013 25002 Tue Sep 17 10:38:41 UTC 2013 24385 Tue Sep 17 10:38:47 UTC 2013 25858 Tue Sep 17 10:38:53 UTC 2013 27476 Tue Sep 17 10:38:58 UTC 2013 27073 Tue Sep 17 10:39:04 UTC 2013 28260 As you can see, we're getting that EADDRNOTAVAIL error because we quickly overload the number of local ports available for sending data. Now, what's using up all these ports? ubuntu@i-b06378d1:~$ while (true); do date; netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n; sleep 5; done; Tue Sep 17 11:29:53 UTC 2013 1 172.16.0.23 1 50.31.164.149 1 Address 1 servers) 4 212.184.128.155 5 10.139.0.148 921 10.152.173.219 Tue Sep 17 11:29:58 UTC 2013 1 50.31.164.149 1 Address 1 servers) 4 212.184.128.155 5 10.139.0.148 2518 10.152.173.219 Tue Sep 17 11:30:03 UTC 2013 1 172.16.0.23 1 Address 1 servers) 4 212.184.128.155 5 10.139.0.148 4096 10.152.173.219 We're connecting to 10.152.173.219:9200. # Host and port for your Elastic search cluster ELASTIC_SEARCH_URL='elasticsearch://makeapi-es.mofoprod.net:9200' I don't know how to fix this (yet) but we do have a root cause!
JP showed me almost the same stack last night, and I saw references to it where you can easily overload a socket with async if you don't pause to let it flush and write out some of what's in its buffer. I know Chris uses async-like stuff to bundle results. http://stackoverflow.com/questions/17588237/error-connect-eaddrnotavail-while-processing-big-async-loop
This behaviour where we overload the local number of sockets is this guys fault: https://github.com/mozilla/MakeAPI/blob/master/lib/models/make.js#L144 Comment that out, and you don't spin up thousands of connections. Another solution would be to update the client to use keep-alive logic...
Depends on: 879432
Sent https://github.com/jamescarr/mongoosastic/pull/70 to get merged. I'll use my repo for our fix in the meantime
Assignee: cade → jon
Attachment #805957 - Flags: review?(cade)
Comment on attachment 805957 [details] [review] https://github.com/mozilla/MakeAPI/pull/146 There's a hilarious async bug with how synchronize emits events that we are going to ignore for now. We've confirmed that the sync does indeed take place. It's only noticeable on small collections that can close the stream before Elasticsearch can respond to update calls. R+
Attachment #805957 - Flags: review?(cade) → review+
Everything is looking a-okay now. Lets hope it stays that way...
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Attachment mime type: text/plain → text/x-github-pull-request
Blocks: 943926
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: