Closed Bug 1356699 Opened 7 years ago Closed 7 years ago

Set up minio for integration tests against python_etl

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: amiyaguchi, Unassigned)

Details

Anthony Miyaguchi [:amiyaguchi]

Reporter

Description

•

7 years ago

Minio is an open-source, s3 compatible object store that can be run locally. Much of our data-pipeline is reliant on s3 for data storage in production. Minio has the potential to isolate and simplify testing of various components in the data-pipeline.

python_etl makes a good candidate for this kind of infrastructure because s3 is the location of both the input and output for many ETL jobs that are scheduled in airflow, e.g. churn to churn_to_csv. This repository also has tests that depend on a local installation of Spark, and has it implemented in continuous integration.

There are a few problems with using minio, such as proprietary hadoop binaries on EMR that have an incompatible notion of the s3 URI prefix. This would require a consistent usage of s3a:// across tests, and switching over to s3:// during production. There is also the problem of requiring a hadoop binary of 2.8 or above, which is not currently distributed as a prepackaged bundle. [1]

There should be at least one integration test against churn and churn_to_csv that can demonstrate a broader use of minio in validating our infrastructure.


[1] https://github.com/minio/minio/issues/2965

Anthony Miyaguchi [:amiyaguchi]

Reporter

Updated

•

7 years ago

Points: --- → 3

Priority: -- → P2

Roberto Agostino Vitillo (:rvitillo)

Comment 1

•

7 years ago

Minio might be overkill for this. I would suggest to use moto's stand-alone server mode [1] which is what we use to test our telemetry APIs.  

[1] https://github.com/spulec/moto#stand-alone-server-mode

Thomas Huelbert

Updated

•

7 years ago

Component: Metrics: Pipeline → Datasets: General

Product: Cloud Services → Data Platform and Tools

Anthony Miyaguchi [:amiyaguchi]

Reporter

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Assignee

Updated

•

2 years ago

Component: Datasets: General → General

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Set up minio for integration tests against python_etl

Categories

(Data Platform and Tools :: General, enhancement, P2)

Tracking

(Not tracked)

People

(Reporter: amiyaguchi, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Updated

Updated