Closed Bug 1542388 Opened 5 years ago Closed 5 years ago

rewrite archivescraper as django command

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(2 files)

archivescraper is a crontabber job. Code is here:

https://github.com/mozilla-services/socorro/blob/master/socorro/cron/jobs/archivescraper.py

This bug covers rewriting archivescraper as a Django command that runs at a scheduled time using the Django cronrun command. Also, rewrite the tests.

Making this a P2 to do soon.

Priority: -- → P2
Depends on: 1493687

This deployed to stage yesterday. I looked at the records and it's running. It looks like it takes about half the time to run than it did before. I'm not sure why that would be.

Because we're not running it in verbose mode, it's not clear exactly what it's doing. I think I want to adjust the logging a bit.

Also, the subprocesses are printing to stdout which I don't think is getting picked up by the main process and converted to mozlog format.

Three things to look into:

  1. adjust logging so we know what trees it's traversing
  2. log how many builds it found in addition to how many were successfully inserted
  3. see if we can get the subprocesses to send their output to the main process for mozlog formatting

Also, I broke the local dev environment when removing the crontabber job. Need to fix that, too.

willkg merged PR #4915: "bug 1542388: archivescraper fixes" in a8eb3c8.

This fixes all the above things. I'll wait for this to go to stage, then to run, and then I'll see how I feel about everything.

Works perfect on stage now:

demozlogged mozlog formatted logs:

2019-04-29T19:23:04.682184  INFO  crashstats.cron: about to run archivescraper
2019-04-29T19:23:04.764408  INFO  crashstats.cron: archivescraper: scrape_candidates working on /pub/firefox/candidates/
2019-04-29T19:23:05.311038  INFO  crashstats.cron: archivescraper: skipping anything before Firefox and not esr (63)
2019-04-29T19:23:31.480972  INFO  crashstats.cron: archivescraper: worker: could not find json files in: /pub/firefox/candidates/52.0.1esr-candidates/build1/
2019-04-29T19:23:31.481236  INFO  crashstats.cron: archivescraper: worker: could not find json files in: /pub/firefox/candidates/60.0esr-candidates/build3/
2019-04-29T19:23:31.481343  INFO  crashstats.cron: archivescraper: worker: could not find json files in: /pub/firefox/candidates/60.0esr-candidates/build4/
2019-04-29T19:23:31.481448  INFO  crashstats.cron: archivescraper: worker: could not find json files in: /pub/firefox/candidates/60.2.2esr-candidates/build1/
2019-04-29T19:23:31.481676  INFO  crashstats.cron: archivescraper: worker: could not find json files in: /pub/firefox/candidates/60.3.0esr-candidates/build2/
2019-04-29T19:23:31.481804  INFO  crashstats.cron: archivescraper: worker: could not find json files in: /pub/firefox/candidates/63.0.1-candidates/build3/
2019-04-29T19:23:31.481905  INFO  crashstats.cron: archivescraper: worker: could not find json files in: /pub/firefox/candidates/65.0b3-candidates/build1/
2019-04-29T19:23:31.481997  INFO  crashstats.cron: archivescraper: worker: could not find json files in: /pub/firefox/candidates/67.0b15-candidates/build1/
2019-04-29T19:23:32.680190  INFO  crashstats.cron: archivescraper: found 1308 builds; inserted 0 builds
2019-04-29T19:23:32.683002  INFO  crashstats.cron: archivescraper: scrape_candidates working on /pub/devedition/candidates/
2019-04-29T19:23:33.517644  INFO  crashstats.cron: archivescraper: skipping anything before DevEdition and not esr (63)
2019-04-29T19:23:48.162532  INFO  crashstats.cron: archivescraper: found 734 builds; inserted 1 builds
2019-04-29T19:23:48.165235  INFO  crashstats.cron: archivescraper: scrape_candidates working on /pub/mobile/candidates/
2019-04-29T19:23:48.912092  INFO  crashstats.cron: archivescraper: skipping anything before Fennec and not esr (63)
2019-04-29T19:23:57.337756  INFO  crashstats.cron: archivescraper: found 393 builds; inserted 1 builds
2019-04-29T19:23:57.338001  INFO  crashstats.cron: archivescraper: Done!
2019-04-29T19:23:57.338125  INFO  crashstats.cron: successfully ran archivescraper on 2019-04-29 19:23:04.685174+00:00

This just went to prod.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: