Closed Bug 1308169 Opened 8 years ago Closed 8 years ago

spark-csv package should be included by default in our environment

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: rvitillo, Unassigned)

References

Details

User Story

The spark-csv package [1] should be included by default in our Python Spark environment. This was previously the case but some recent changes might have reverted that. The change should be applied to both Airflow and ATMO jobs.

[1] https://github.com/databricks/spark-csv
No description provided.
Blocks: 1283447
User Story: (updated)
User Story: (updated)
Issue has not been resolved. Currently, the following work, and spark-csv can be used to load files: pyspark --packages com.databricks:spark-csv_2.10:1.2.0 spark-shell --packages com.databricks:spark-csv_2.10:1.2.0 But Jupyter still doesn't load the package correctly. There seems to be others who ran into this issue, but the solutions there don't seem to work. (https://github.com/databricks/spark-csv/issues/247)
Mauro, any thoughts about how to fix this?
Flags: needinfo?(mdoglio)
Assignee: fbertsch → nobody
We are going to close this bug as not-fixed, as this functionality will be available in Spark 2.0 (available in < month). If you don't need this job for the time being, please consider terminating it.
Flags: needinfo?(vfilippov)
(In reply to Frank Bertsch [:frank] from comment #3) > We are going to close this bug as not-fixed, as this functionality will be > available in Spark 2.0 (available in < month). If you don't need this job > for the time being, please consider terminating it. Sounds good!
Flags: needinfo?(vfilippov)
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mdoglio)
Resolution: --- → WONTFIX
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.