Probe Scraper DAG failing: AirflowException: Pod Launching failed: Pod returned a failure: failed
Categories
(Data Platform and Tools :: General, defect)
Tracking
(Not tracked)
People
(Reporter: whd, Assigned: whd)
References
Details
Attachments
(1 file)
The last two days have failed. I initially attributed this to the GCP issues from yesterday but now I'm less sure. From the logs it's not clear to me whether this is an airflow issue or a problem with probe scraper. It appears that the pod launches then fails (suggesting it's an issue with probe scraper) but the logging isn't very descriptive and I'm not sure how to debug further.
| Assignee | ||
Comment 1•6 years ago
|
||
:hwoo will take a look and assign to :frank if it's not an airflow infra issue.
| Assignee | ||
Comment 2•6 years ago
|
||
:hwoo showed me how to find these logs, which aren't showing in the airflow UI due to a known issue. The actual issue with probe scraper is understood and :frank is taking care of it in https://bugzilla.mozilla.org/show_bug.cgi?id=1604919.
Comment 3•6 years ago
|
||
Can confirm, this hasn't been fixed yet. Tracked here: https://jira.mozilla.com/browse/EIS-1958
| Assignee | ||
Comment 4•6 years ago
|
||
:frank, we're continuing to see issues with probe scraper, even though that JIRA issue is marked as fixed. The logs indicate some sort of git issue which could be related but looks different from before:
Traceback (most recent call last): File "/usr/local/lib/python3.8/runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.8/runpy.py", line 86, in _run_code exec(code, run_globals) File "/app/probe_scraper/runner.py", line 449, in <module> main(args.cache_dir, File "/app/probe_scraper/runner.py", line 377, in main load_glean_metrics(cache_dir, out_dir, repositories_file, dry_run, glean_repo) File "/app/probe_scraper/runner.py", line 208, in load_glean_metrics commit_timestamps, repos_metrics_data, emails = git_scraper.scrape(cache_dir, repositories) File "/app/probe_scraper/scrapers/git_scraper.py", line 139, in scrape ts, commits = retrieve_files(repo_info, folder) File "/app/probe_scraper/scrapers/git_scraper.py", line 77, in retrieve_files hashes = get_commits(repo, rel_path) File "/app/probe_scraper/scrapers/git_scraper.py", line 36, in get_commits change_commits = enumerate(repo.git.log(log_format, filename).split('\n')) File "/usr/local/lib/python3.8/site-packages/git/cmd.py", line 542, in <lambda> return lambda *args, **kwargs: self._call_process(name, *args, **kwargs) File "/usr/local/lib/python3.8/site-packages/git/cmd.py", line 1005, in _call_process return self.execute(call, **exec_kwargs) File "/usr/local/lib/python3.8/site-packages/git/cmd.py", line 822, in execute raise GitCommandError(command, status, stderr_value, stdout_value) git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git log --format="%H:%ct" toolkit/components/telemetry/fog/pings.yaml
stderr: 'fatal: ambiguous argument 'toolkit/components/telemetry/fog/pings.yaml': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]''
Can you take a look?
Comment 5•6 years ago
|
||
Thanks :whd. It looks like fog took out their pings.yaml file. Jan-Erik, Chutten, where did we move this to?
Huh, we indeed did. I just did not realize this would fail the parser (but it's kinda obvious now).
FOG has been removed and therefore there is no pings.yaml anymore. I will remove that line from the config file.
That brings up the question: should probe-scraper hard-fail on these kinds of errors?
| Assignee | ||
Comment 8•6 years ago
|
||
We don't have a good data deletion policy in place, and it sounds like the current solution will result in the removal of the schemas from our schemas repository, which will then result in the removal of these datasets from bigquery. We should probably keep the last set of existing schemas in generated-schemas as they currently are (by whatever mechanism), until such a time as we have developed a better policy around source dataset deletion.
This said, if all relevant parties (including :mreid, whom I've CC'd) are fine with deleting this data (which sounds like prototype data), our deploy mechanisms will in fact delete (with manual operator approval) all source datasets for this data if the schemas are removed from the generated-schemas branch.
calling in :chutten for that, but I don't see why we should keep around the prototype data.
| Assignee | ||
Comment 11•6 years ago
|
||
:mreid, is the deletion of prototype data from production an acceptable course of action here?
Comment 12•6 years ago
|
||
Yes, in this case deleting the data is acceptable. I don't think this is generally the case when removing an application, schema, or other artifact, let's revisit if/when there's another case.
Comment 13•6 years ago
|
||
That brings up the question: should probe-scraper hard-fail on these kinds of errors?
We do want to hard-fail in the sense that the error is apparent and we deal with it. However we do not need to block schema deploys for all errors, what should happen is that the schema is simply not updated. How to handle this is a broader and more difficult discussion.
| Assignee | ||
Comment 14•6 years ago
|
||
This specific issue (actually 2 issues) has been resolved. The broader and more difficult discussion is a subset of the general discussion of how to deal with data deletion, and can be tracked separately.
Updated•3 years ago
|
Description
•