python_moztelemetry installing wrong version of happybase on ATMO

RESOLVED FIXED

Status

Cloud Services
Metrics: Pipeline
P1
normal
RESOLVED FIXED
8 months ago
8 months ago

People

(Reporter: harter, Assigned: wlach)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

8 months ago
When trying to use HBase views on default ATMO clusters I get the following error:

Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/mnt/yarn/usercache/hadoop/appcache/application_1491224809382_0013/container_1491224809382_0013_01_000005/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/mnt/yarn/usercache/hadoop/appcache/application_1491224809382_0013/container_1491224809382_0013_01_000005/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1491224809382_0013/container_1491224809382_0013_01_000005/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream
    vs = list(itertools.islice(iterator, batch))
  File "/mnt/anaconda2/lib/python2.7/site-packages/moztelemetry/hbase.py", line 166, in _get_range
    columns=[self.column_family], reverse=reverse):
TypeError: scan() got an unexpected keyword argument 'reverse'

        at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
        at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
        at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
        at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
        at org.apache.spark.scheduler.Task.run(Task.scala:86)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        ... 1 more



As far as I can tell, the issue appears to be that we're using the wrong version of happybase. We specify an alternate dependency link for happybase in python_moztelemetry's setup.py [0]. However, it looks like this dependency is not being installed by default on atmo clusters. 

[0] https://github.com/mozilla/python_moztelemetry/blob/master/setup.py#L22
It appears as if dependency_links have long been deprecated and don't work by default:

http://serverfault.com/a/628714

It appears as if we're using this to pick up some changes that are not in a released version of happybase (that we depend on). I'll file an issue to ask if the author can release a new version. Meanwhile, is there a way you can specify the installation of this version of happybase for atmo?

https://github.com/wbolster/happybase/archive/33b7700375ba59f1810c30c8cd531577b0718498.zip#egg=happybase-1.0.1
Flags: needinfo?(rharter)
Filed: wbolster/happybase#164
(In reply to William Lachance (:wlach) (use needinfo!) from comment #2)
> Filed: wbolster/happybase#164

This is now fixed so we can just depend on happybase 1.1.0, I think.
Flags: needinfo?(rharter)
Created attachment 8854192 [details] [review]
PR
Status: NEW → RESOLVED
Last Resolved: 8 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.