Closed Bug 1271719 Opened 8 years ago Closed 8 years ago

Firefox Engagement Ratio Spark job is failing

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mreid, Assigned: mreid)

Details

Figure out why, fix it.
Looks somewhat obscure from the log:

...snip...
Caused by: java.io.FileNotFoundException: /mnt/yarn/usercache/hadoop/appcache/application_1462853298787_0001/blockmgr-f4358145-c989-417a-bcce-d0bfcc1a1371/09/shuffle_23_12_0.index (No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:146)
	at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.getSortBasedShuffleBlockData(ExternalShuffleBlockResolver.java:275)
	... 27 more

	at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
	at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
	at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
Assignee: nobody → mreid
Points: --- → 1
Priority: -- → P1
There was a problem with the job that converts the Heka Executive Summary stream to Parquet, so the underlying derived dataset was missing some data.

I've changed the job to use the main_summary dataset here:
https://github.com/mozilla-services/data-pipeline/pull/205

The dashboard data has already been fixed and backfilled.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.