Closed
Bug 475970
(bugscrape)
Opened 17 years ago
Closed 16 years ago
Save weekly csv files of buglists of changes onto a folder on the server
Categories
(Mozilla Messaging Graveyard :: Server Operations, defect)
Mozilla Messaging Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: gkw, Unassigned)
References
Details
Attachments
(6 files, 4 obsolete files)
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-01-19+03%3A00&chfieldto=2009-01-26+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-01-19+03%3A00&chfieldto=2009-01-26+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
are examples of queries from which to get csv files from. The first is TB, second MailNews Core.
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-02-02+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-02-02+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
The only variations of these buglists is the date range -- the one on top is from 2009-01-19 03:00PST (bugzilla server time, i.e. PST) to 2009-01-26 03:00. So there should be 2 csv files for the week _ending_ 2009-01-26 03:00, one for TB and one for MailNews Core, probably calling them 20090126-tb.csv and 20090126-mailnews.csv should suffice.
The next week, a cron job should pull and save buglists 20090202-tb.csv and 20090202-mailnews.csv and so on.
These buglists should no longer be needed after a fortnight or so, but prior to finalization of a python script to post-process these csv files into HTML readable form, I would like the deletions not to occur yet.
| Reporter | ||
Comment 1•17 years ago
|
||
(In reply to comment #0)
> https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-02-02+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
>
> https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-02-02+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
Sorry, the second set should be:
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-01-26+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-01-26+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
Comment 2•16 years ago
|
||
Run like
$> ./bugscrape.pl
Will leave the 2 .csv files you wanted in the current working directory.
$> ./bugscrape.pl 2009-01-04
Will scrape the csv reports for the full week previous from that date.
Can you just verify that the scraping is giving you what you'd expect ?
Comment 3•16 years ago
|
||
Comment 4•16 years ago
|
||
Comment 5•16 years ago
|
||
Comment 6•16 years ago
|
||
| Reporter | ||
Comment 7•16 years ago
|
||
Hi gozer, two issues:
1. The csv files you attach seem to having missing bug entries. See the diffs.
2. Another thing I never addressed (my fault, sorry) is that I should have these columns in the bug search query page -- |bug_id,"opendate","bug_severity","priority","assigned_to","bug_status","resolution","op_sys","short_desc"| which I set for my personal preferences. Is this possible to be changed? (the default for your generated csv files is |bug_id,"bug_severity","priority","op_sys","assigned_to","bug_status","resolution","short_desc"|)
Comment 8•16 years ago
|
||
1: Possibly because I was running these queries anonymously ?
2: Again, mostly the result of running these queries without being logged in, would there be a way to request these fields as part of the URL, and not an assumption about the defaults a certain user has picked?
Comment 9•16 years ago
|
||
Columns are |&columnlist=opendate%2Cbug_severity...|
| Reporter | ||
Comment 10•16 years ago
|
||
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&columnlist=opendate%2Cbug_severity%2Cpriority%2Cassigned_to%2Cbug_status%2Cresolution%2Cop_sys%2Cshort_desc&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-01-19+03%3A00&chfieldto=2009-01-26+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&columnlist=opendate%2Cbug_severity%2Cpriority%2Cassigned_to%2Cbug_status%2Cresolution%2Cop_sys%2Cshort_desc&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-01-19+03%3A00&chfieldto=2009-01-26+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&columnlist=opendate%2Cbug_severity%2Cpriority%2Cassigned_to%2Cbug_status%2Cresolution%2Cop_sys%2Cshort_desc&product=Thunderbird&resolution=FIXED&chfieldfrom=2009-01-26+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
https://bugzilla.mozilla.org/buglist.cgi?query_format=advanced&columnlist=opendate%2Cbug_severity%2Cpriority%2Cassigned_to%2Cbug_status%2Cresolution%2Cop_sys%2Cshort_desc&product=MailNews+Core&resolution=FIXED&chfieldfrom=2009-01-26+03%3A00&chfieldto=2009-02-02+03%3A00&chfield=resolution&chfieldvalue=Fixed&order=Bug+Number&ctype=csv
Thanks philor for the tip. I just added the column parameters and they seem to work great. Gozer, could you pls update the script and re-grab the CSVs?
I don't think running the queries anonymously could result in different bug lists. Assuming the perl script does give the exact queries, they should work identically. (Note that I've had issues where & was parsed as & and % as something else or similar issues like that, I don't know if it would affect) Perhaps justdave can help?
(I get identical buglists either not-logged-in or logged-in, excluding security / restricted bugs, of course.)
Comment 11•16 years ago
|
||
Fixed the URLs to include explicit column names
Fixed incorrect date processing.
Attachment #360779 -
Attachment is obsolete: true
Comment 12•16 years ago
|
||
Attachment #360913 -
Attachment is obsolete: true
Comment 13•16 years ago
|
||
Attachment #360914 -
Attachment is obsolete: true
| Reporter | ||
Comment 14•16 years ago
|
||
(In reply to comment #11)
> Created an attachment (id=361310) [details]
> TB Bugzilla scraper v2
>
> Fixed the URLs to include explicit column names
> Fixed incorrect date processing.
Three more issues.
1. AFAICS the start time should be 2009-01-19 03:00 and not 00:00, similarly, the end time should be 2009-01-26 03:00 and not 00:00. Nitpick because usually build machines start their build at 03:00 PST or PDT.
2. So far, this is starting to look good, thanks. Could I pls have a feature request to allow the parsing of product Thunderbird from the commandline? Similar to:
perl bugscrape.pl thunderbird <- Thunderbird csv query
perl bugscrape.pl mailnews <- Mailnews csv query
3. Is the Mozilla license boilerplate necessary?
Comment 15•16 years ago
|
||
(In reply to comment #14)
> (In reply to comment #11)
> > Created an attachment (id=361310) [details] [details]
> > TB Bugzilla scraper v2
> >
> > Fixed the URLs to include explicit column names
> > Fixed incorrect date processing.
>
> Three more issues.
>
> 1. AFAICS the start time should be 2009-01-19 03:00 and not 00:00, similarly,
> the end time should be 2009-01-26 03:00 and not 00:00. Nitpick because usually
> build machines start their build at 03:00 PST or PDT.
And that's what the time is set to, from the URLs this script would get:
https://bugzilla.mozilla.org/buglist.cgi?chfield=resolution&chfieldfrom=2009-02-02%2003%2000&chfieldto=2009-02-09%2003%3A00[...]
That's 2009-02-02 03:00 and 2009-02-09 03:00
> 2. So far, this is starting to look good, thanks. Could I pls have a feature
> request to allow the parsing of product Thunderbird from the commandline?
> Similar to:
>
> perl bugscrape.pl thunderbird <- Thunderbird csv query
> perl bugscrape.pl mailnews <- Mailnews csv query
Sure, that's not very hard either. Added 2 options:
-p Product
-o output-prefix
-o defaults to lowercased product, so your example would become
$> ./bugscrape.pl -p Thunderbird
$> ./bugscrape.pl -p "MailNews Core" -o mailnews
> 3. Is the Mozilla license boilerplate necessary?
What are you talking about ?
As for the perl dependencies on the mac, with MacPorts, try
$> port install p5-date-calc p5-timedate
Comment 16•16 years ago
|
||
Attachment #361310 -
Attachment is obsolete: true
| Reporter | ||
Comment 17•16 years ago
|
||
(In reply to comment #15)
> And that's what the time is set to, from the URLs this script would get:
>
> https://bugzilla.mozilla.org/buglist.cgi?chfield=resolution&chfieldfrom=2009-02-02%2003%2000&chfieldto=2009-02-09%2003%3A00[...]
>
> That's 2009-02-02 03:00 and 2009-02-09 03:00
So theoretically, this should be identical, but I have to be able to run bugscrape.pl locally first. :-/
> Sure, that's not very hard either. Added 2 options:
>
> -p Product
> -o output-prefix
>
> -o defaults to lowercased product, so your example would become
>
> $> ./bugscrape.pl -p Thunderbird
> $> ./bugscrape.pl -p "MailNews Core" -o mailnews
Awesome, thanks!
> > 3. Is the Mozilla license boilerplate necessary?
>
> What are you talking about ?
http://www.mozilla.org/MPL/boilerplate-1.1/
> As for the perl dependencies on the mac, with MacPorts, try
>
> $> port install p5-date-calc p5-timedate
Thanks for the hint, testing now..
Alias: bugscrape
| Reporter | ||
Comment 18•16 years ago
|
||
(In reply to comment #16)
> Created an attachment (id=361703) [details]
> TB Bugzilla scraper v3
This is wonderful, Gozer! :) It works excellently locally on my machine, and I verified that it produces the same results as the online bug search.
Some more feature requests:
1 "-P" should accept null as a parameter, that way it will probe all possible products.
2 "Component", "Keywords", "Status", "Severity", "Priority", "Hardware" and "OS" should be added as a possible additional parameters and expand on "Resolution":
-- It can be used like "...&product=Thunderbird&component=General&..." or else "...&product=Thunderbird&component=&..." can be used if all components under product Thunderbird are required, in this case a null value is entered for Component.
-- Keywords should be like this "...&keywords_type=allwords&keywords=hang...", the word hang is variable and can be null, keywords_type can only be "allwords", "anywords" or "nowords", and only show if &keywords is not null.
-- Status should ideally be like "...&bug_status=UNCONFIRMED&..." and can only take as parameters:
"UNCONFIRMED"
"NEW"
"ASSIGNED"
"REOPENED"
"RESOLVED"
"VERIFIED"
"CLOSED"
-- Severity should be like "...&bug_severity=blocker&..." and can only take as parameters:
"blocker"
"critical"
"major"
"normal
"minor"
"trivial"
"enhancement"
-- Priority should be like "...&priority=--&..." and can only take as parameters:
"--"
"P1"
"P2"
"P3"
"P4"
"P5"
-- Hardware should be like "...&rep_platform=All&..." and can only take as parameters:
"All"
"ARM"
"DEC"
"HP"
"PowerPC"
"x86"
"x86_64"
"SGI"
"Sun"
"XScale"
"Other"
-- OS should be like "...&op_sys=All&..." and can only take the following as parameters:
"All"
"Windows 95"
"Windows 98"
"Windows ME"
"Windows NT"
"Windows 2000"
"Windows XP"
"Windows Server 2003"
"Windows Vista"
"Windows 7"
"Windows CE"
"Windows Mobile 6 Standard"
"Windows Mobile 6 Professional"
"Mac System 7"
"Mac System 7.5"
"Mac System 7.6.1"
"Mac System 8.0"
"Mac System 8.5"
"Mac System 8.6"
"Mac System 9.x"
"Mac OS X"
"Linux"
"Linux (embedded)"
"BSDI"
"FreeBSD"
"NetBSD"
"OpenBSD"
"AIX"
"BeOS"
"HP-UX"
"IRIX"
"Neutrino"
"OpenVMS"
"OS/2"
"OSF/1"
"SunOS"
"Solaris"
"OpenSolaris"
"Symbian"
"Other"
-- Resolution should accept the following parameters from the command-line (besides Fixed):
"FIXED"
"INVALID"
"WONTFIX"
"DUPLICATE"
"WORKSFORME"
"INCOMPLETE"
"EXPIRED"
"MOVED"
3 Should a parameter default to null, don't add the initial &FOO block. i.e. the above mentioned "...&product=Thunderbird&component=&..." should end up as "...&product=Thunderbird&..." <-- note the missing component word, which is the correct situation for a null parameter. (The former would work as well, but the latter will make it more tidy, will be useful when debugging large search URLs)
4 Turn all %args in bugscrape.pl into command line options, accepting null as a parameter as mentioned in part 3, i.e. bugscrape should accept the startdate and enddate as command line arguments. (You've already turned $product into a command-line option) We can then easily adjust as needed, perhaps over a month, or perhaps over product releases.
5 License boilerplate from http://www.mozilla.org/MPL/boilerplate-1.1/ should be added.
That said, I'm learning Perl from reading bugscrape.pl, and the basics don't seem too difficult at all.. :)
These improvements should allow us to feed our requirements into bugscrape.pl and letting it become as generic as possible. I can already search for Firefox's changed bugs over the past week simply by inputting "-P Firefox" as a command-line argument to become an automated variant form of The Burning Edge in bugscrape.pl version 3.
Comment 19•16 years ago
|
||
At this time, I don't have time to invest into this script, and I believe gary can make the changes he needs. Ping me if you run into roadblocks.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•