Closed
Bug 1355708
Opened 7 years ago
Closed 7 years ago
Scheduled ATMO job for probe-scraper not run
Categories
(Data Platform and Tools Graveyard :: Telemetry Analysis Service (ATMO), enhancement)
Data Platform and Tools Graveyard
Telemetry Analysis Service (ATMO)
x86
macOS
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: gfritzsche, Unassigned)
Details
I noticed that the data from the probe-scraper [1] job wasn't getting updated since Apr 6. From what i see in ATMO, it didn't run since then: > Identifier > gfritzsche-telemetry-probe-scraper > Notebook name > load_and_run.ipynb > Result visibility > Public > Cluster size > 1 > Run interval > 24 hours > Job timeout > 1 > Start date > 2017-04-03 07:00 > Last scheduled date > 2017-04-06 14:55 > Last run date > n/a > Last terminated date > 2017-04-08 00:25 > Is enabled 1: https://github.com/mozilla/probe-scraper
Reporter | ||
Updated•7 years ago
|
Summary: Scheduled ATMO job not run → Scheduled ATMO job for probe-scraper not run
Comment 1•7 years ago
|
||
Marc mentioned something similar yesterday as well, in https://github.com/mozilla/telemetry-analysis-service/issues/385 and I ran the job manually form the admin. It seems as if the workers are stuck, since the job I started wasn't updated on ATMO either, in other words, the cluster status wasn't pulled from AWS and written to the ATMO db. :robotblake Can you restart the prod cluster and see if that unclogs the system?
Flags: needinfo?(bimsland)
Comment 2•7 years ago
|
||
Just restarted the scheduler and the workers on all nodes. Not sure if it's a similar issue but on redash we've run into cases where celery workers (using the redis backend) get into a strange state where they'll ack a job, hang without processing said job, and then never accept new work. There were similar bugs filed against celery on github that got closed with the release of v4 but that may not have been entirely resolved, I can dig up references if need be.
Flags: needinfo?(bimsland)
Comment 3•7 years ago
|
||
:robotblake: So what could we do to monitor Celery? Add CPU/memory monitors via Cloudwatch? Datadog? https://deadmanssnitch.com/? https://healthchecks.io/?
Reporter | ||
Comment 4•7 years ago
|
||
I rescheduled the job "gfritzsche-telemetry-probe-scraper", it still did not run. How can i get this working?
Flags: needinfo?(jezdez)
Reporter | ||
Comment 5•7 years ago
|
||
Ok, this is confusing: - the output suggests something did run [1] - ATMO says "last run date: N/A" [2] - the output data suggests it did not run since my last manual run [3] 1: https://nbviewer.jupyter.org/url/s3-us-west-2.amazonaws.com/telemetry-public-analysis-2/gfritzsche-telemetry-probe-scraper/data/load_and_run.ipynb 2: https://analysis.telemetry.mozilla.org/jobs/104/#results 3: https://analysis-output.telemetry.mozilla.org/probe-scraper/data/general.json
Comment 6•7 years ago
|
||
still valid Jannis?
Comment 7•7 years ago
|
||
I suspect this has been resolved... Georg, can you confirm whether or not this is still a problem? Thanks!
Flags: needinfo?(jezdez) → needinfo?(gfritzsche)
Reporter | ||
Comment 8•7 years ago
|
||
It works fine for me now.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(gfritzsche)
Resolution: --- → WORKSFORME
Updated•4 years ago
|
Product: Data Platform and Tools → Data Platform and Tools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•