Enable predictions for RegretsReporter video similarity model
Categories
(Data Platform and Tools :: General, task)
Tracking
(Not tracked)
People
(Reporter: jessed, Unassigned)
Details
We need to make predictions using a PyTorch model. I've put the input data in regrets-reporter-dev.regrets_reporter_analysis.for_pred I can provide the model.
The model will be updated daily for a couple weeks and predictions ideally are regenerated each day.
Running this in colab in infeasible as the data will not fit in memory, and so predictions must bee batched, including fetching the data from BigQuery, which is too slow for the large (at least 10000 I think) number of batches that would be needed.
Maybe moving the data from BQ to a GCS bucket and running the model in cloud compute that can read from that bucket?
Comment 1•4 years ago
•
|
||
Using a notebook will take 38 days to process this amount of data.
There are a few options for to accomplish the processing:
- High level would be cloud functions however there may be issues installing the required packages.
- Using app engine or compute engine. This will require more effort due to managing the distributed aspect needed since the predictions have to complete in 1 day but may not be avoidable.
As Jesse mentioned loading the batches from BigQuery may be to slow. In addition to that since the predictions will be run daily for a couple weeks it may make sense to pre-process the data into chunks.
Comment 2•9 months ago
|
||
Hello,
The Mozilla Data Engineering organization is currently going through our extensive backlog, consisting of hundreds of issues stretching back for nearly 10 years. We've done a pass through all of the open bugzilla bugs and have identified and tagged the ones that we think are relevant enough to still need attention. The rest, including the bug with which this comment is associated, we are closing as "WONTFIX" in a single bulk operation.
If you feel we have closed this (or any) issue in error, please feel free to take the following actions:
- Reopen the bug.
- Edit the bug to add the string
[dataplatform](including the brackets) to theWhiteboardfield. (Note that you must edit theWhiteboard, not the similarly namedQA Whiteboard.)
Doing this will ensure that we see the bug in our weekly triage process, where we will decide how to proceed.
Thank you.
Description
•