Closed Bug 1735530 Opened 4 years ago Closed 4 years ago

Airflow task probe_scraper.probe_scraper failing on 2021-10-13

Categories

(Data Platform and Tools :: General, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: akomar, Assigned: relud)

Details

(Whiteboard: [airflow-triage])

Attachments

(2 files)

The Airflow task probe_scraper.probe_scraper failed on 2021-10-13.
Airflow logs do not seem to contain any hints on what's wrong here.
I will restart the task but I'm filing a bug anyway since it runs for 2 hours.

There are now some docs inlined to the probe scraper dag for directing people to the actual probe-scraper logs.

Assigning to :relud since they might have more context on fixing this. This looks related to a similar previous fix: https://github.com/mozilla/probe-scraper/pull/339

More context on how this happened is also in this Slack thread: https://mozilla.slack.com/archives/GE83ZMSAW/p1634148034056400?thread_ts=1634147125.054700&cid=GE83ZMSAW

Assignee: ascholtz → dthorn

one issue fixed, but another was revealed:

fatal: path 'toolkit/components/glean/pings.yaml' exists on disk, but not in '340c8521a54ad4d4a32dd16333676a6ff85aaec2'

I'm currently testing a fix.

Status: NEW → ASSIGNED

this is still broken, this time with:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/app/probe_scraper/runner.py", line 592, in <module>
    main(
  File "/app/probe_scraper/runner.py", line 499, in main
    load_glean_metrics(cache_dir, out_dir, repositories_file, dry_run, glean_repos)
  File "/app/probe_scraper/runner.py", line 360, in load_glean_metrics
    abort_after_emails |= glean_checks.check_for_duplicate_metrics(
  File "/app/probe_scraper/glean_checks.py", line 80, in check_for_duplicate_metrics
    dependencies = [repo.name] + [
  File "/app/probe_scraper/glean_checks.py", line 81, in <listcomp>
    repo_by_library_name[library_name] for library_name in repo.dependencies
KeyError: 'org.mozilla.components:browser-engine-gecko-release'

which i will attempt to address in the morning

last issue was fixed by https://github.com/mozilla/probe-scraper/pull/347 after which probe scraper ran successfully.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Component: Datasets: General → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: