Closed
Bug 684112
Opened 13 years ago
Closed 13 years ago
Add nightly & aurora FTP-scraping to releases_raw
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
2.3.2
People
(Reporter: jberkus, Assigned: rhelmer)
References
Details
In 2.2.3 we're getting nightly/aurora information from the "builds" table. However, this table is slated to be depreciated, so we need to add them to releases_raw instead.
Comment 1•13 years ago
|
||
I guess bug 640242 should be able to hook into this, then.
Assignee | ||
Updated•13 years ago
|
Target Milestone: 2.3 → 2.3.1
Assignee | ||
Comment 2•13 years ago
|
||
First shot at rewriting the old FTP scraper: https://github.com/rhelmer/socorro/commit/f831575c70cd6336177ee35f31422f69cf258322 This is still heavily influenced by the old scraper, and I've hooked it into the "Socorro way" of doing things (unit tests, config, wrapper scripts etc. - the unit test is based on the old scraper, for instance) since I want to make this week's freeze. Given more time I'd like to think about how we can start moving towards something with less boilerplate, and also fewer socorro-isms. I've tested that this seems to work (and put data into devdb for jberkus to review) and also that the unit tests pass. peterbe, lonnen, brandon, lars - any thoughts? Asking for feedback rather than r? since we can't pull this until we disable the old one, and I'd like to move the "nightly report" UI to use a matview based on releases_raw at the same time (jberkus is working on that now).
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•13 years ago
|
||
To be clear, the main bit of code that's being changed here is: https://github.com/mozilla/socorro/blob/v2.2.4/socorro/cron/builds.py Replaced by: https://github.com/rhelmer/socorro/blob/f831575c70cd6336177ee35f31422f69cf258322/socorro/cron/ftpscraper.py
Assignee | ||
Comment 4•13 years ago
|
||
One last thing - I've replaced the use of SGMLParser with BeautifulSoup, which while not included in stdlib, tests OK for me with the RHEL-provided RPM so deployment shouldn't be an issue, we just need to make sure the stage and prod puppet manifests get the new package.
Reporter | ||
Comment 5•13 years ago
|
||
There's also a database schema change in upgrade/2.3.1/ associated with this bug.
Assignee | ||
Comment 6•13 years ago
|
||
OK took peterbe and lonnen's comments into account - I think this is ready to land: r? https://github.com/mozilla/socorro/pull/73 Lots of little changes from comment #2 but bigger ones are: * switch from beautifulsoup to lxml.html ** RHEL-provided RPM seems ok for this purpose, just need parsing * pep8 (for everything except schema.py) This is blocking work we want to get into 2.3.1 (code freeze EOD tomorrow) and I have some other changes that need to go into this release. Totally happy to continue improving this, but I'd like to push anything non-trivial that doesn't need to block ship to next week's release.
Comment 7•13 years ago
|
||
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/d872b44000fbc0b167139ca566e39e9c3b0cb8d3 Merge pull request #73 from rhelmer/bug684112-rewrite-ftp-scraper rewrite FTP scraper to support nightly/aurora
Assignee | ||
Comment 8•13 years ago
|
||
Here are a few supporting changes, r? anyone who has time: Stop the old scraper from writing to releases_raw table: https://github.com/mozilla/socorro/pull/74 We're going to remove the old scraper entirely in 2.3.2, see bug 694466. Add backfill support to ftpScraper.py: https://github.com/mozilla/socorro/pull/75 Josh reminded me today that we needed this for when we push the release, did a little refactoring I had intended to do anyway as part of it (split up the nightly and release main-loops). The larger implied change here is that for nightlies, we're always going to look in http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011/10/ instead of http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-* (which are symlinks to the above actually). It means we'll pick up a few more builds than we otherwise would have, but it should not be a big deal. In general, I'd rather pick up a few extra builds than put a bunch of special-casing in the code - scraping FTP to get this info is already error-prone as it is. Release don't go into these dated dirs, and there are way fewer of them, so we automatically backfill whatever we can find on every run - this code doesn't need to change.
Comment 9•13 years ago
|
||
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/562a0d2ae1deca8614aec07cc51e162e9b283969 Merge pull request #74 from rhelmer/bug684112-disable-old-scraper bug 684112 - disable releases for old scraper
Comment 10•13 years ago
|
||
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/cc8c57e313097e05306b8768cd4b0c9fe59d83c7 Merge pull request #75 from rhelmer/bug684112-ftp-scraper-backfill bug 684112 - add backfill support
Assignee | ||
Updated•13 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 11•13 years ago
|
||
Commits pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/b123e9e4a6f810327bc28292af9743ac93503f33 bug 684112 - easier to join if we leave a1/a2 in here https://github.com/mozilla/socorro/commit/2dcfa9c18e20850caa204fa3242f41a824e67054 Merge pull request #85 from rhelmer/bug684112-fix-version-column bug 684112 - easier to join if we leave a1/a2 in here
Comment 12•13 years ago
|
||
Commits pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/07452de6c55ab16836699b2a92bfba3001f010a9 bug 684112 - indentation is wrong here, want to run insertBuild regardless of nightly/aurora https://github.com/mozilla/socorro/commit/7b144f544b8a2cd710feb8e58e45a2228bf712ff Merge pull request #86 from rhelmer/bug684112-fix-indentation bug 684112 - indentation is wrong here, want to run insertBuild regardles
Comment 13•13 years ago
|
||
Commits pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/5afd93df53c7a41eb4cc062705f30e563aa4c6ff bug 684112 - easier to join if we leave a1/a2 in here https://github.com/mozilla/socorro/commit/df6b2e12024308edad0d6460f1c6b2c97f6d87b8 bug 684112 - indentation is wrong here, want to run insertBuild regardless of nightly/aurora
Assignee | ||
Comment 14•13 years ago
|
||
This depends on DB changes that were pulled from 2.3.1, so bumping this and reopening. The old scraper (disabled in comment 9) should be re-enabled for 2.3.1
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 2.3.1 → 2.3.2
Comment 15•13 years ago
|
||
Commits pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/4a0bfd54bd9d8d3067356baa5a68b4c5cd0a7297 bug 684112 - reinstate old release scraper until new DB changes are ready https://github.com/mozilla/socorro/commit/626ad0533b1b5599d8fa287d7b9d0acb5e0d2725 Merge pull request #104 from rhelmer/bug684112-reinstate-old-scraper bug 684112 - reinstate old release scraper until new DB changes are ready
Comment 16•13 years ago
|
||
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/f723ab74590beb7a93be5def0d313b1b66008628 bug 684112 - reinstate old release scraper until new DB changes are ready
Comment 17•13 years ago
|
||
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/f723ab74590beb7a93be5def0d313b1b66008628 bug 684112 - reinstate old release scraper until new DB changes are ready
Reporter | ||
Comment 18•13 years ago
|
||
The DB changes for releases_raw are still in 2.3.1 and are ready. If there is a bug with them, that's a different matter.
Assignee | ||
Comment 19•13 years ago
|
||
(In reply to Josh Berkus from comment #18) > The DB changes for releases_raw are still in 2.3.1 and are ready. > > If there is a bug with them, that's a different matter. I don't feel that this has adequate testing to replace the old scraper yet, and I am out this week so can't help with any fallout. It's not really necessary to enable this until bug 684106 ships, so not worth the risk.
Assignee | ||
Comment 20•13 years ago
|
||
This should be ready for 2.3.2, it was only backed out on 2.3.1 branch
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 21•13 years ago
|
||
There's a bug in the way single-digit months are handled (should always be padded to two digits, since that's what FTP wants). One-line, tested fix incoming.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 22•13 years ago
|
||
Commits pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/9b72f34d8838a33d9d0f3c6bb573a5067d61c9fa bug 684112 - format month to two digits https://github.com/mozilla/socorro/commit/fc5d8b348176aedea1f2510dd9edc88b161436e2 Merge pull request #130 from rhelmer/2.3.2 bug 684112 - format month to two digits
Assignee | ||
Updated•13 years ago
|
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Comment 23•13 years ago
|
||
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/9b72f34d8838a33d9d0f3c6bb573a5067d61c9fa bug 684112 - format month to two digits
Comment 24•13 years ago
|
||
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/a25fe165c30b289e1fd779f5e519aca900d78b85 Merge pull request #128 from rhelmer/bug684112-month-formatting-fix bug 684112 - format month to two digits
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•