Closed Bug 1304100 Opened 8 years ago Closed 8 years ago

Unknown field 'binary' in parquet2hive

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: frank, Assigned: frank)

References

Details

When using parquet2hive on a dataset that includes a binary field, it doesn't add it and reports the following error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.UnsupportedOperationException: Unknown field type: binary
Can test this on the cluster with parquet2hive s3://telemetry-parquet/client_count -ulv | bash
This used to work with p2h 0.2.7: [hadoop@ip-172-31-27-197 ~]$ pip freeze | grep parquet2hive You are using pip version 6.1.1, however version 8.1.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command. parquet2hive==0.2.7 [hadoop@ip-172-31-27-197 ~]$ parquet2hive s3://telemetry-parquet/client_count -ulv Analyzing dataset client_count, v2016032020160920 hive -hiveconf hive.support.sql11.reserved.keywords=false -e 'drop table if exists client_count_v2016032020160920; create external table client_count_v2016032020160920(`activity_date` string, `devtools_toolbox_opened` boolean, `loop_activity_open_panel` boolean, `normalized_channel` string, `country` string, `locale` string, `app_name` string, `app_version` string, `e10s_enabled` boolean, `e10s_cohort` string, `os` string, `os_version` string, `hll` binary) stored as parquet location '"'s3://telemetry-parquet/client_count/v2016032020160920'"'; msck repair table client_count_v2016032020160920;' hive -e 'drop table if exists client_count; create external table client_count(`activity_date` string, `devtools_toolbox_opened` boolean, `loop_activity_open_panel` boolean, `normalized_channel` string, `country` string, `locale` string, `app_name` string, `app_version` string, `e10s_enabled` boolean, `e10s_cohort` string, `os` string, `os_version` string, `hll` binary) stored as parquet location '"'s3://telemetry-parquet/client_count/v2016032020160920'"'; msck repair table client_count;'
Severity: normal → major
Assignee: nobody → fbertsch
Turns out I misunderstood what this bug is about. parquet2hive is not compatible with older Hive versions (e.g. 1.0.0), like the one being created by atmo v1. That's a non issue though as we are moving to atmo v2 which comes with Hive 2.1.0.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.