Closed
Bug 1377548
Opened 7 years ago
Closed 7 years ago
Main_summary queries keep failing (when called from PySpark) with no such file /directory error
Categories
(Data Platform and Tools Graveyard :: Presto, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joy, Unassigned)
Details
This is my query select client_id , sum(case when active_ticks is null then 0 else active_ticks*5 end) as usg, last(case when profile_creation_date <= '16968' then 1 else 30 / ( 16997 - profile_creation_date + 1) end) as opportunity from main_summary where app_name='Firefox' and submission_date >= '20160616' and submission_date <= '20160715' and profile_creation_date <= 16997 and sample_id < '5' group by 1 and i get this error Py4JJavaError: An error occurred while calling o52.sql. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1709 in stage 0.0 failed 4 times, most recent failure: Lost task 1709.3 in stage 0.0 (TID 1766, ip-172-31-5-86.us-west-2.compute.inter nal): java.io.FileNotFoundException: No such file or directory 's3://telemetry-parquet/main_summary/v4/submission_date_s3=20160601/sample_id=0' at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:812) ( No such file or directory 's3://telemetry-parquet/main_summary/v4/submission_date_s3=20160601/sample_id=0' ) (if i rerun this, i get the same error but the missing file has a different sample_id)
Comment 1•7 years ago
|
||
This appears to work fine now, though note that using `submission_date_s3` will work much much better than using `submission_date` since it is a paritioning field.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•4 years ago
|
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•