Probe Scraper failing since Aug 30th 2021
Categories
(Data Platform and Tools :: General, defect, P1)
Tracking
(Not tracked)
People
(Reporter: wlach, Unassigned)
Details
Attachments
(2 files)
The probe-scraper run hasn't run successfully since Aug 30th 2021. It appears that part of the problem was an incorrectly applied PR which caused it to silently fail, which is now fixed:
https://github.com/mozilla/probe-scraper/pull/334
However, the most recent re-run has failed (non silently this time, at least!):
Unfortunately we don't have logs in airflow of what happened due to mozilla/telemetry-airflow#844. IIRC the workaround is to use stackdriver logging, but I don't think I have access to that. :whd, would you be able to retrieve some logs to tell us what might be happening?
| Reporter | ||
Updated•4 years ago
|
Comment 1•4 years ago
|
||
You should be able to see these logs in stackdriver via e.g. https://cloudlogging.app.goo.gl/bDosHSe2HVYJM12b7. It looks like the third attempt of the job is running at with pod name probe-scraper-70738f73c8fd4f8b9aada7ecd34d2e27.
| Reporter | ||
Comment 2•4 years ago
|
||
Got an error log. It looks like we need to use cp rather than sync:
https://stackoverflow.com/questions/19745050/is-it-possible-to-sync-a-single-file-to-s3
Comment 3•4 years ago
|
||
| Reporter | ||
Comment 4•4 years ago
|
||
I've merged a fix but we're basically at my end of day now. I think I'll just wait for the probe scraper to run normally tonight. I expect it to complete successfully.
Comment 5•4 years ago
|
||
It seems probe-scraper and schema deploys ran tonight. The mps deploy dashboard reports successful runs.
However we did receive an errro email for try 1 out of 3 at 00:14 UTC with Pod Launching failed: Pod Launching failed: There was an error reading the kubernetes API.
Seems try 2 succeeded then I guess.
| Reporter | ||
Comment 6•4 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #5)
It seems probe-scraper and schema deploys ran tonight. The mps deploy dashboard reports successful runs.
However we did receive an errro email for try 1 out of 3 at 00:14 UTC withPod Launching failed: Pod Launching failed: There was an error reading the kubernetes API.
Seems try 2 succeeded then I guess.
I checked in stackdriver and didn't see anything unusual happen in the first run, it seems likely that was an intermittent problem with the kubernetes <-> airflow component. Looking at mps-deploys I suspect this sort of thing happens occasionally looking at the logs (likely we're seeing this problem in the entries where there is a delay in the graph):
| Reporter | ||
Comment 7•4 years ago
|
||
I checked in stackdriver and didn't see anything unusual happen in the first run, it seems likely that was an intermittent problem with the kubernetes <-> airflow component. Looking at mps-deploys I suspect this sort of thing happens occasionally looking at the logs (likely we're seeing this problem in the entries where there is a delay in the graph):
I just checked and the probe scraper job actually finished on schedule in the last few months (i.e. delays in deploys must have been due to other reasons).
I'm going to resolve this for now-- we can open a new bug if the problem recurs.
Description
•