Closed
Bug 1342194
Opened 8 years ago
Closed 8 years ago
Add Dataset to Presto
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: harter, Assigned: robotblake)
References
Details
(Whiteboard: [SvcOps])
Please add the following table to presto with the name "cliqz_profile_daily".
s3://telemetry-parquet/harter/cliqz_profile_daily/v1/
Blake, let me know if you want to show me how to do this myself :).
Updated•8 years ago
|
Points: --- → 1
Priority: -- → P2
Updated•8 years ago
|
Whiteboard: [SvcOps]
| Assignee | ||
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Comment 1•8 years ago
|
||
I'm getting an error in re:dash on the search_counts column in this table.
The query
`select * from default.cliqz_profile_daily limit 10`
fails with the error:
Error running query: Error opening Hive split s3://telemetry-parquet/harter/cliqz_profile_daily/v1/part-r-00000-691cf6b9-716b-40ae-9cc2-9a3680a13e58.snappy.parquet (offset=0, length=3825274): Expected MAP column 'search_counts.entry' entry field 0 to be primitive, but is required group key { optional binary _1 (UTF8); optional binary _2 (UTF8); }
| Reporter | ||
Comment 2•8 years ago
|
||
Looks like re:dash is complaining about an empty Map in the search_counts column. I'll take a look at the parquet file to make sure everything is working before presto.
| Reporter | ||
Comment 3•8 years ago
|
||
I can read the parquet table from Spark without any problems. Frank, are there any known issues with reading maps into Hive?
Flags: needinfo?(fbertsch)
Comment 4•8 years ago
|
||
Hmm, nope. But that is an odd error message. The second part, "required group key", is the type of the first key [0]. Very odd.
Blake, any ideas?
[0] https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/parquet/ParquetHiveRecordCursor.java#L1147
Flags: needinfo?(fbertsch) → needinfo?(bimsland)
| Reporter | ||
Comment 5•8 years ago
|
||
The search_counts column contains a map((varchar, varchar), long). It looks like Presto requires the key to be a primative type, so I tried changing this to a map(varchar, long) by concatenating the elements of the tuple.
Blake can you refresh parquet2hive so we can test whether this fixes the problem? David, is that an acceptable solution?
Status: RESOLVED → REOPENED
Flags: needinfo?(dzeber)
Resolution: FIXED → ---
Comment 6•8 years ago
|
||
Yes, that should be fine. I can confirm that I can now query the search_counts column without error. Thanks!
Flags: needinfo?(dzeber)
| Reporter | ||
Updated•8 years ago
|
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Flags: needinfo?(bimsland)
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•