Closed Bug 1324639 Opened 8 years ago Closed 7 years ago

Queries on Presto are failing with "Encountered too many errors talking to a worker node"

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1319086

People

(Reporter: flod, Assigned: robotblake)

Details

(Whiteboard: [SvcOps])

For the past 10 hours or so I haven't been able to run queries on Presto.

Right now I've forked one existing query, when I try to run it I get a huge error message starting with

Error running query: {"errorCode":65537,"message":"Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (getting task status http://172.31.18.17:8889/v1/task/20161220_045726_00046_q4taj.1.5 - 20 failures, time since last success 122.78s)","errorType":"INTERNAL_ERROR","failureInfo":{"type":"com.facebook.presto.spi.PrestoException","message":"Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (getting task status http://172.31.18.17:8889/v1/task/20161220_045726_00046_q4taj.1.5 - 20 failures, time since last success 122.78s)","suppressed":[{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.18.17:8889/v1/task/20161220_045726_00046_q4taj.1.5/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.18.17:8889/v1/task/20161220_045726_00046_q4taj.1.5/status","suppressed":[],"stack":
Flags: needinfo?(bimsland)
Whiteboard: [SvcOps]
Severity: normal → blocker
The other error I was seeing intermittently is:
> Error running query: Should not have nextUri if failed 

If I reduce the time my query takes by throwing in some `LIMIT NNNN` then the problem happens less it seems.
Points: --- → 3
Priority: -- → P1
Assignee: nobody → bimsland
This is the same underlying issue so marking it as a dupe.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(bimsland)
Resolution: --- → DUPLICATE
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.