Closed Bug 1308131 Opened 8 years ago Closed 8 years ago

re:dash/PrestoDB are failing with count(distinct *)

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dexter, Unassigned)

References

Details

(Whiteboard: [SvcOps])

While working on a relatively simple query, I stumbled upon a weird behaviour: whenever I try to count the number of distinct client ids in a small subset of records, the query [1] dies.

If I simply count the number of records in the |sample| view, it returns the correct (and very small!) count: 10 (see the commented line).

If I use approx_distinct or COUNT(DISTINCT client_id):

> Error running query: {"errorCode":65537,"message":"Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (getting task status http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30 - 18 failures, time since last success 124.29s)","errorType":"INTERNAL_ERROR","failureInfo":{"type":"com.facebook.presto.spi.PrestoException","message":"Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (getting task status http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30 - 18 failures, time since last success 124.29s)","suppressed":[{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]}],"stack":["com.facebook.presto.server.remotetask.RequestErrorTracker.requestFailed(RequestErrorTracker.java:126)","com.facebook.presto.server.remotetask.ContinuousTaskStatusFetcher.failed(ContinuousTaskStatusFetcher.java:186)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},"errorName":"TOO_MANY_REQUESTS_FAILED"} 

This seems to suggest that something fishy is going on in the background.

[1] - https://sql.telemetry.mozilla.org/queries/1359/source
Blake, would you please take a look a this issue?
Flags: needinfo?(bimsland)
Whiteboard: [SvcOps]
I restarted the offending worker node a couple hours after this occurred and everything recovered.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(bimsland)
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.