Closed Bug 475970 (bugscrape) Opened 17 years ago Closed 16 years ago

Save weekly csv files of buglists of changes onto a folder on the server

Categories

(Mozilla Messaging Graveyard :: Server Operations, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gkw, Unassigned)

References

Details

Attachments

(6 files, 4 obsolete files)

https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-01-19+03%3A00&chfieldto=2009-01-26+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-01-19+03%3A00&chfieldto=2009-01-26+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv are examples of queries from which to get csv files from. The first is TB, second MailNews Core. https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-02-02+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-02-02+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv The only variations of these buglists is the date range -- the one on top is from 2009-01-19 03:00PST (bugzilla server time, i.e. PST) to 2009-01-26 03:00. So there should be 2 csv files for the week _ending_ 2009-01-26 03:00, one for TB and one for MailNews Core, probably calling them 20090126-tb.csv and 20090126-mailnews.csv should suffice. The next week, a cron job should pull and save buglists 20090202-tb.csv and 20090202-mailnews.csv and so on. These buglists should no longer be needed after a fortnight or so, but prior to finalization of a python script to post-process these csv files into HTML readable form, I would like the deletions not to occur yet.
Attached file TB Bugzilla scraper (obsolete) —
Run like $> ./bugscrape.pl Will leave the 2 .csv files you wanted in the current working directory. $> ./bugscrape.pl 2009-01-04 Will scrape the csv reports for the full week previous from that date. Can you just verify that the scraping is giving you what you'd expect ?
Attached file 20090126-tb.csv (obsolete) —
Attached file 20090126-mailnews.csv (obsolete) —
Attached file situation
Hi gozer, two issues: 1. The csv files you attach seem to having missing bug entries. See the diffs. 2. Another thing I never addressed (my fault, sorry) is that I should have these columns in the bug search query page -- |bug_id,"opendate","bug_severity","priority","assigned_to","bug_status","resolution","op_sys","short_desc"| which I set for my personal preferences. Is this possible to be changed? (the default for your generated csv files is |bug_id,"bug_severity","priority","op_sys","assigned_to","bug_status","resolution","short_desc"|)
1: Possibly because I was running these queries anonymously ? 2: Again, mostly the result of running these queries without being logged in, would there be a way to request these fields as part of the URL, and not an assumption about the defaults a certain user has picked?
Columns are |&columnlist=opendate%2Cbug_severity...|
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&columnlist=opendate%2Cbug_severity%2Cpriority%2Cassigned_to%2Cbug_status%2Cresolution%2Cop_sys%2Cshort_desc&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-01-19+03%3A00&chfieldto=2009-01-26+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&columnlist=opendate%2Cbug_severity%2Cpriority%2Cassigned_to%2Cbug_status%2Cresolution%2Cop_sys%2Cshort_desc&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-01-19+03%3A00&chfieldto=2009-01-26+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&columnlist=opendate%2Cbug_severity%2Cpriority%2Cassigned_to%2Cbug_status%2Cresolution%2Cop_sys%2Cshort_desc&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-01-26+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&columnlist=opendate%2Cbug_severity%2Cpriority%2Cassigned_to%2Cbug_status%2Cresolution%2Cop_sys%2Cshort_desc&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-01-26+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv Thanks philor for the tip. I just added the column parameters and they seem to work great. Gozer, could you pls update the script and re-grab the CSVs? I don't think running the queries anonymously could result in different bug lists. Assuming the perl script does give the exact queries, they should work identically. (Note that I've had issues where & was parsed as & and % as something else or similar issues like that, I don't know if it would affect) Perhaps justdave can help? (I get identical buglists either not-logged-in or logged-in, excluding security / restricted bugs, of course.)
Attached file TB Bugzilla scraper v2 (obsolete) —
Fixed the URLs to include explicit column names Fixed incorrect date processing.
Attachment #360779 - Attachment is obsolete: true
Attached file 20090126-tb.csv v2
Attachment #360913 - Attachment is obsolete: true
Attachment #360914 - Attachment is obsolete: true
(In reply to comment #11) > Created an attachment (id=361310) [details] > TB Bugzilla scraper v2 > > Fixed the URLs to include explicit column names > Fixed incorrect date processing. Three more issues. 1. AFAICS the start time should be 2009-01-19 03:00 and not 00:00, similarly, the end time should be 2009-01-26 03:00 and not 00:00. Nitpick because usually build machines start their build at 03:00 PST or PDT. 2. So far, this is starting to look good, thanks. Could I pls have a feature request to allow the parsing of product Thunderbird from the commandline? Similar to: perl bugscrape.pl thunderbird <- Thunderbird csv query perl bugscrape.pl mailnews <- Mailnews csv query 3. Is the Mozilla license boilerplate necessary?
(In reply to comment #14) > (In reply to comment #11) > > Created an attachment (id=361310) [details] [details] > > TB Bugzilla scraper v2 > > > > Fixed the URLs to include explicit column names > > Fixed incorrect date processing. > > Three more issues. > > 1. AFAICS the start time should be 2009-01-19 03:00 and not 00:00, similarly, > the end time should be 2009-01-26 03:00 and not 00:00. Nitpick because usually > build machines start their build at 03:00 PST or PDT. And that's what the time is set to, from the URLs this script would get: https://bugzilla.mozilla.org/buglist.cgi?chfield=resolution&chfieldfrom=2009-02-02%2003%2000&chfieldto=2009-02-09%2003%3A00[...] That's 2009-02-02 03:00 and 2009-02-09 03:00 > 2. So far, this is starting to look good, thanks. Could I pls have a feature > request to allow the parsing of product Thunderbird from the commandline? > Similar to: > > perl bugscrape.pl thunderbird <- Thunderbird csv query > perl bugscrape.pl mailnews <- Mailnews csv query Sure, that's not very hard either. Added 2 options: -p Product -o output-prefix -o defaults to lowercased product, so your example would become $> ./bugscrape.pl -p Thunderbird $> ./bugscrape.pl -p "MailNews Core" -o mailnews > 3. Is the Mozilla license boilerplate necessary? What are you talking about ? As for the perl dependencies on the mac, with MacPorts, try $> port install p5-date-calc p5-timedate
Attached file TB Bugzilla scraper v3
Attachment #361310 - Attachment is obsolete: true
(In reply to comment #15) > And that's what the time is set to, from the URLs this script would get: > > https://bugzilla.mozilla.org/buglist.cgi?chfield=resolution&chfieldfrom=2009-02-02%2003%2000&chfieldto=2009-02-09%2003%3A00[...] > > That's 2009-02-02 03:00 and 2009-02-09 03:00 So theoretically, this should be identical, but I have to be able to run bugscrape.pl locally first. :-/ > Sure, that's not very hard either. Added 2 options: > > -p Product > -o output-prefix > > -o defaults to lowercased product, so your example would become > > $> ./bugscrape.pl -p Thunderbird > $> ./bugscrape.pl -p "MailNews Core" -o mailnews Awesome, thanks! > > 3. Is the Mozilla license boilerplate necessary? > > What are you talking about ? http://www.mozilla.org/MPL/boilerplate-1.1/ > As for the perl dependencies on the mac, with MacPorts, try > > $> port install p5-date-calc p5-timedate Thanks for the hint, testing now..
Alias: bugscrape
(In reply to comment #16) > Created an attachment (id=361703) [details] > TB Bugzilla scraper v3 This is wonderful, Gozer! :) It works excellently locally on my machine, and I verified that it produces the same results as the online bug search. Some more feature requests: 1 "-P" should accept null as a parameter, that way it will probe all possible products. 2 "Component", "Keywords", "Status", "Severity", "Priority", "Hardware" and "OS" should be added as a possible additional parameters and expand on "Resolution": -- It can be used like "...&product=Thunderbird&component=General&..." or else "...&product=Thunderbird&component=&..." can be used if all components under product Thunderbird are required, in this case a null value is entered for Component. -- Keywords should be like this "...&keywords_type=allwords&keywords=hang...", the word hang is variable and can be null, keywords_type can only be "allwords", "anywords" or "nowords", and only show if &keywords is not null. -- Status should ideally be like "...&bug_status=UNCONFIRMED&..." and can only take as parameters: "UNCONFIRMED" "NEW" "ASSIGNED" "REOPENED" "RESOLVED" "VERIFIED" "CLOSED" -- Severity should be like "...&bug_severity=blocker&..." and can only take as parameters: "blocker" "critical" "major" "normal "minor" "trivial" "enhancement" -- Priority should be like "...&priority=--&..." and can only take as parameters: "--" "P1" "P2" "P3" "P4" "P5" -- Hardware should be like "...&rep_platform=All&..." and can only take as parameters: "All" "ARM" "DEC" "HP" "PowerPC" "x86" "x86_64" "SGI" "Sun" "XScale" "Other" -- OS should be like "...&op_sys=All&..." and can only take the following as parameters: "All" "Windows 95" "Windows 98" "Windows ME" "Windows NT" "Windows 2000" "Windows XP" "Windows Server 2003" "Windows Vista" "Windows 7" "Windows CE" "Windows Mobile 6 Standard" "Windows Mobile 6 Professional" "Mac System 7" "Mac System 7.5" "Mac System 7.6.1" "Mac System 8.0" "Mac System 8.5" "Mac System 8.6" "Mac System 9.x" "Mac OS X" "Linux" "Linux (embedded)" "BSDI" "FreeBSD" "NetBSD" "OpenBSD" "AIX" "BeOS" "HP-UX" "IRIX" "Neutrino" "OpenVMS" "OS/2" "OSF/1" "SunOS" "Solaris" "OpenSolaris" "Symbian" "Other" -- Resolution should accept the following parameters from the command-line (besides Fixed): "FIXED" "INVALID" "WONTFIX" "DUPLICATE" "WORKSFORME" "INCOMPLETE" "EXPIRED" "MOVED" 3 Should a parameter default to null, don't add the initial &FOO block. i.e. the above mentioned "...&product=Thunderbird&component=&..." should end up as "...&product=Thunderbird&..." <-- note the missing component word, which is the correct situation for a null parameter. (The former would work as well, but the latter will make it more tidy, will be useful when debugging large search URLs) 4 Turn all %args in bugscrape.pl into command line options, accepting null as a parameter as mentioned in part 3, i.e. bugscrape should accept the startdate and enddate as command line arguments. (You've already turned $product into a command-line option) We can then easily adjust as needed, perhaps over a month, or perhaps over product releases. 5 License boilerplate from http://www.mozilla.org/MPL/boilerplate-1.1/ should be added. That said, I'm learning Perl from reading bugscrape.pl, and the basics don't seem too difficult at all.. :) These improvements should allow us to feed our requirements into bugscrape.pl and letting it become as generic as possible. I can already search for Firefox's changed bugs over the past week simply by inputting "-P Firefox" as a command-line argument to become an automated variant form of The Burning Edge in bugscrape.pl version 3.
At this time, I don't have time to invest into this script, and I believe gary can make the changes he needs. Ping me if you run into roadblocks.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: