re:dash/PrestoDB are failing with count(distinct *)

RESOLVED FIXED

Status

RESOLVED FIXED
2 years ago
6 days ago

People

(Reporter: Dexter, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [SvcOps])

(Reporter)

Description

2 years ago
While working on a relatively simple query, I stumbled upon a weird behaviour: whenever I try to count the number of distinct client ids in a small subset of records, the query [1] dies.

If I simply count the number of records in the |sample| view, it returns the correct (and very small!) count: 10 (see the commented line).

If I use approx_distinct or COUNT(DISTINCT client_id):

> Error running query: {"errorCode":65537,"message":"Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (getting task status http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30 - 18 failures, time since last success 124.29s)","errorType":"INTERNAL_ERROR","failureInfo":{"type":"com.facebook.presto.spi.PrestoException","message":"Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (getting task status http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30 - 18 failures, time since last success 124.29s)","suppressed":[{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},{"type":"com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.ServiceUnavailableException","message":"Server returned SERVICE_UNAVAILABLE: http://172.31.27.22:8889/v1/task/20161006_084453_01966_tchp7.1.30/status","suppressed":[],"stack":["com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]}],"stack":["com.facebook.presto.server.remotetask.RequestErrorTracker.requestFailed(RequestErrorTracker.java:126)","com.facebook.presto.server.remotetask.ContinuousTaskStatusFetcher.failed(ContinuousTaskStatusFetcher.java:186)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:52)","com.facebook.presto.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)","com.google.common.util.concurrent.Futures$6.run(Futures.java:1319)","io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:77)","java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)","java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)","java.lang.Thread.run(Thread.java:745)"]},"errorName":"TOO_MANY_REQUESTS_FAILED"} 

This seems to suggest that something fishy is going on in the background.

[1] - https://sql.telemetry.mozilla.org/queries/1359/source
(Reporter)

Comment 1

2 years ago
Blake, would you please take a look a this issue?
Flags: needinfo?(bimsland)
Whiteboard: [SvcOps]
I restarted the offending worker node a couple hours after this occurred and everything recovered.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Flags: needinfo?(bimsland)
Resolution: --- → FIXED

Updated

6 days ago
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.