This is my query select client_id , sum(case when active_ticks is null then 0 else active_ticks*5 end) as usg, last(case when profile_creation_date <= '16968' then 1 else 30 / ( 16997 - profile_creation_date + 1) end) as opportunity from main_summary where app_name='Firefox' and submission_date >= '20160616' and submission_date <= '20160715' and profile_creation_date <= 16997 and sample_id < '5' group by 1 and i get this error Py4JJavaError: An error occurred while calling o52.sql. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1709 in stage 0.0 failed 4 times, most recent failure: Lost task 1709.3 in stage 0.0 (TID 1766, ip-172-31-5-86.us-west-2.compute.inter nal): java.io.FileNotFoundException: No such file or directory 's3://telemetry-parquet/main_summary/v4/submission_date_s3=20160601/sample_id=0' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:812) ( No such file or directory 's3://telemetry-parquet/main_summary/v4/submission_date_s3=20160601/sample_id=0' ) (if i rerun this, i get the same error but the missing file has a different sample_id)
This appears to work fine now, though note that using `submission_date_s3` will work much much better than using `submission_date` since it is a paritioning field.
Status: NEW → RESOLVED
Last Resolved: 8 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.