Closed Bug 1342194 Opened 8 years ago Closed 8 years ago

Add Dataset to Presto

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: harter, Assigned: robotblake)

References

Details

(Whiteboard: [SvcOps])

Please add the following table to presto with the name "cliqz_profile_daily". s3://telemetry-parquet/harter/cliqz_profile_daily/v1/ Blake, let me know if you want to show me how to do this myself :).
Points: --- → 1
Priority: -- → P2
Whiteboard: [SvcOps]
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
I'm getting an error in re:dash on the search_counts column in this table. The query `select * from default.cliqz_profile_daily limit 10` fails with the error: Error running query: Error opening Hive split s3://telemetry-parquet/harter/cliqz_profile_daily/v1/part-r-00000-691cf6b9-716b-40ae-9cc2-9a3680a13e58.snappy.parquet (offset=0, length=3825274): Expected MAP column 'search_counts.entry' entry field 0 to be primitive, but is required group key { optional binary _1 (UTF8); optional binary _2 (UTF8); }
Looks like re:dash is complaining about an empty Map in the search_counts column. I'll take a look at the parquet file to make sure everything is working before presto.
I can read the parquet table from Spark without any problems. Frank, are there any known issues with reading maps into Hive?
Flags: needinfo?(fbertsch)
Hmm, nope. But that is an odd error message. The second part, "required group key", is the type of the first key [0]. Very odd. Blake, any ideas? [0] https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetHiveRecordCursor.java#L1147
Flags: needinfo?(fbertsch) → needinfo?(bimsland)
The search_counts column contains a map((varchar, varchar), long). It looks like Presto requires the key to be a primative type, so I tried changing this to a map(varchar, long) by concatenating the elements of the tuple. Blake can you refresh parquet2hive so we can test whether this fixes the problem? David, is that an acceptable solution?
Status: RESOLVED → REOPENED
Flags: needinfo?(dzeber)
Resolution: FIXED → ---
Yes, that should be fine. I can confirm that I can now query the search_counts column without error. Thanks!
Flags: needinfo?(dzeber)
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Flags: needinfo?(bimsland)
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.