Closed
Bug 917022
Opened 11 years ago
Closed 11 years ago
MakeAPI server constantly crashing
Categories
(Webmaker Graveyard :: MakeAPI, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mjschranz, Assigned: jon)
References
Details
Attachments
(1 file)
Apparently Chris knows about this, but I basically can't save anything in Popcorn Maker.
It just hangs and never winds up returning from the server. Dave seemed to be under the impression that it was due to the MakeAPI.
Assignee | ||
Comment 1•11 years ago
|
||
Yes, things are definitely failing:
localhost:Downloads jon$ elbh makeapi-production
INSTANCE_ID i-29deef49 InService N/A N/A
INSTANCE_ID i-2bdeef4b InService N/A N/A
INSTANCE_ID i-320f4158 InService N/A N/A
INSTANCE_ID i-2ddf934f InService N/A N/A
INSTANCE_ID i-701a1113 InService N/A N/A
INSTANCE_ID i-8c02cbf7 InService N/A N/A
INSTANCE_ID i-8e02cbf5 InService N/A N/A
INSTANCE_ID i-fa1f4391 InService N/A N/A
INSTANCE_ID i-69af0913 InService N/A N/A
INSTANCE_ID i-6faf0915 InService N/A N/A
INSTANCE_ID i-9f4752f4 OutOfService Instance has failed at least the UnhealthyThreshold number of health checks consecutively. Instance
INSTANCE_ID i-f1859b99 InService N/A N/A
INSTANCE_ID i-599c9731 InService N/A N/A
INSTANCE_ID i-5f9c9737 InService N/A N/A
INSTANCE_ID i-b06378d1 OutOfService Instance has failed at least the UnhealthyThreshold number of health checks consecutively. Instance
INSTANCE_ID i-b26378d3 OutOfService Instance has failed at least the UnhealthyThreshold number of health checks consecutively. Instance
INSTANCE_ID i-107a8873 InService N/A N/A
Assignee | ||
Comment 2•11 years ago
|
||
I ran an instance in my terminal and got the following error after about a minute:
{"name":"makeapi","hostname":"i-b06378d1","pid":3556,"level":60,"err":{"message":"connect EADDRNOTAVAIL","name":"Error","stack":"Error: connect EADDRNOTAVAIL\n at errnoException (net.js:884:11)\n at connect (net.js:747:19)\n at net.js:825:9\n at asyncCallback (dns.js:68:16)\n at Object.onanswer [as oncomplete] (dns.js:121:9)","code":"EADDRNOTAVAIL"},"msg":"connect EADDRNOTAVAIL","time":"2013-09-17T09:49:55.639Z","v":0}
/var/www/makeapi/lib/logger.js:54
throw err;
^
Error: connect EADDRNOTAVAIL
at errnoException (net.js:884:11)
at connect (net.js:747:19)
at net.js:825:9
at asyncCallback (dns.js:68:16)
at Object.onanswer [as oncomplete] (dns.js:121:9)
Curiously enough, the server appeared to not work, but it's as if the process was still running?
Assignee | ||
Comment 3•11 years ago
|
||
DATA
While running the makeapi in my terminal, I also ran this bash script which outputs the number of open tcp/udp sockets:
ubuntu@i-b06378d1:~$ while (true); do date; netstat -an | egrep -c 'tcp|udp'; sleep 5; done;
Tue Sep 17 10:36:49 UTC 2013
1191
Tue Sep 17 10:36:54 UTC 2013
2781
Tue Sep 17 10:36:59 UTC 2013
4382
Tue Sep 17 10:37:04 UTC 2013
5795
Tue Sep 17 10:37:09 UTC 2013
7431
Tue Sep 17 10:37:15 UTC 2013
9024
Tue Sep 17 10:37:20 UTC 2013
10664
Tue Sep 17 10:37:25 UTC 2013
12026
Tue Sep 17 10:37:30 UTC 2013
13073
Tue Sep 17 10:37:36 UTC 2013
14633
Tue Sep 17 10:37:41 UTC 2013
15533
Tue Sep 17 10:37:46 UTC 2013
17217
Tue Sep 17 10:37:52 UTC 2013
18868
Tue Sep 17 10:37:57 UTC 2013
18181
Tue Sep 17 10:38:03 UTC 2013
19786
Tue Sep 17 10:38:08 UTC 2013
21247
Tue Sep 17 10:38:14 UTC 2013
20666
Tue Sep 17 10:38:19 UTC 2013
22377
Tue Sep 17 10:38:25 UTC 2013
22627
Tue Sep 17 10:38:30 UTC 2013
23357
Tue Sep 17 10:38:36 UTC 2013
25002
Tue Sep 17 10:38:41 UTC 2013
24385
Tue Sep 17 10:38:47 UTC 2013
25858
Tue Sep 17 10:38:53 UTC 2013
27476
Tue Sep 17 10:38:58 UTC 2013
27073
Tue Sep 17 10:39:04 UTC 2013
28260
As you can see, we're getting that EADDRNOTAVAIL error because we quickly overload the number of local ports available for sending data. Now, what's using up all these ports?
ubuntu@i-b06378d1:~$ while (true); do date; netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n; sleep 5; done;
Tue Sep 17 11:29:53 UTC 2013
1 172.16.0.23
1 50.31.164.149
1 Address
1 servers)
4 212.184.128.155
5 10.139.0.148
921 10.152.173.219
Tue Sep 17 11:29:58 UTC 2013
1 50.31.164.149
1 Address
1 servers)
4 212.184.128.155
5 10.139.0.148
2518 10.152.173.219
Tue Sep 17 11:30:03 UTC 2013
1 172.16.0.23
1 Address
1 servers)
4 212.184.128.155
5 10.139.0.148
4096 10.152.173.219
We're connecting to 10.152.173.219:9200.
# Host and port for your Elastic search cluster
ELASTIC_SEARCH_URL='elasticsearch://makeapi-es.mofoprod.net:9200'
I don't know how to fix this (yet) but we do have a root cause!
Comment 4•11 years ago
|
||
JP showed me almost the same stack last night, and I saw references to it where you can easily overload a socket with async if you don't pause to let it flush and write out some of what's in its buffer. I know Chris uses async-like stuff to bundle results.
http://stackoverflow.com/questions/17588237/error-connect-eaddrnotavail-while-processing-big-async-loop
Assignee | ||
Comment 5•11 years ago
|
||
This behaviour where we overload the local number of sockets is this guys fault:
https://github.com/mozilla/MakeAPI/blob/master/lib/models/make.js#L144
Comment that out, and you don't spin up thousands of connections. Another solution would be to update the client to use keep-alive logic...
Assignee | ||
Comment 6•11 years ago
|
||
Sent https://github.com/jamescarr/mongoosastic/pull/70 to get merged. I'll use my repo for our fix in the meantime
Assignee | ||
Comment 7•11 years ago
|
||
Assignee: cade → jon
Attachment #805957 -
Flags: review?(cade)
Comment 8•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/MakeAPI
https://github.com/mozilla/MakeAPI/commit/0521524c8c134eb9a2829d369b0d2698792f5ebd
Bug 917022 - Handle error events from MongoDB-ES sychronization
Comment 9•11 years ago
|
||
Comment on attachment 805957 [details] [review]
https://github.com/mozilla/MakeAPI/pull/146
There's a hilarious async bug with how synchronize emits events that we are going to ignore for now. We've confirmed that the sync does indeed take place. It's only noticeable on small collections that can close the stream before Elasticsearch can respond to update calls.
R+
Attachment #805957 -
Flags: review?(cade) → review+
Assignee | ||
Comment 10•11 years ago
|
||
Everything is looking a-okay now. Lets hope it stays that way...
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Attachment mime type: text/plain → text/x-github-pull-request
You need to log in
before you can comment on or make changes to this bug.
Description
•