Closed Bug 1478741 Opened 7 years ago Closed 7 years ago

ftpscraper is ruining my dev life

Categories

(Socorro :: General, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

Attachments

(1 file)

Socorro requires build information for products in order for a bunch of it to work (processor, webapp, etc). Thus to bootstrap a Socorro local development environment, one has to run ftpscraper. This takes about 20 minutes. If one is slicking and rebuilding an environment on a regular basis--perhaps one is a core developer--then one wastes GOBS of time waiting for ftpscraper to run. So much time that one sometimes spends it thinking of new and devious ways one could get around running it at all. That led to bug #1395681. That cached .json files on disk. For some reason, that's not working very well and ftpscraper still takes like 20 minutes to run. Further, that cache is sufficiently large size-wise and number-of-files-wise that it causes other problems. I don't actually care what ftpscraper generates in some cases. Having most of the builds rather than all of the builds including latest nightly builds is usually fine. While complaining about this for the nth time, I had an idea. Maybe we could write a wrapper script that saves the contents of the tables to a cache after ftpscraper runs and if such data existed, asks to load it into the db and skip ftpscraper? That'd be soooo much faster and save me gobs of time. This bug covers looking into that idea.
Making this a P2. I don't have time to work on this, but it'd save so much time that I might as well pretend I do have time to work on it and try to get to it soon unless someone beats me to it.
Priority: -- → P2
Grabbing this--I worked out a decent implementation last night. The wrapper script runs ftpscraper and captures the log file and if the log file is less than 7 days old, it'll replay the log file rather than run ftpscraper again. Backing up a bit, I thought ftpscraper took 20 minutes, but actually it takes about 10 minutes on my machine. Pretty sure in past observations, I've seen this take longer to run on occasion. It's very network intensive, so if archive.mozilla.org is slow to answer, then that slows the script down. With the wrapper, the first run takes 10 minutes and subsequent runs take 10 seconds. Forcing a run involves remembering there's a cache file and removing it. That's not discoverable. We could add a flag somewhere, but we'd have to thread it through a bunch of things (make file, bash script, docker-compose run) and document it and it still wouldn't be very discoverable. For now, I'm going to add removing the .cache directory to "make clean". That seems like a good enough pass on something that affects devs only.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Commits pushed to master at https://github.com/mozilla-services/socorro https://github.com/mozilla-services/socorro/commit/c702cab2c434ea8ea7123b8e4c31b38817679ad1 bug 1478741: remove ftpscraper file caching I don't think the file caching ever worked right. This removes it. https://github.com/mozilla-services/socorro/commit/a1fc0389db6fa3721a24a4712327a1b86a5fa6a0 fix bug 1478741: add ftpscraper wrapper to make it FASTER This adds an ftpscraper wrapper to be used in the local development environment that takes a lot of pain out of slicking-and-rebuilding a local development environment. It keeps a lot of ftpscraper and replays it if the log is less than 7 days old. "make clean" will wipe the log. That covers most of my use cases. https://github.com/mozilla-services/socorro/commit/d8cf7f63afe9bdf3cd02ba7512ec6e5d7c51a8be Merge pull request #4526 from willkg/1478741-ftpscraper-wrapper fix bug 1478741: add ftpscraper wrapper
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: