Closed
Bug 1162105
Opened 10 years ago
Closed 8 years ago
disable daily CSV cron job
Categories
(Testing Graveyard :: Sisyphus, defect)
Testing Graveyard
Sisyphus
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: lars, Unassigned)
References
Details
We need a migration plan. It appears that Sisyphus is the last consumer of the old Socorro CSV files known as the DailyURL.
As the Socorro project migrates to AWS, we need to make sure that the data from Socorro still flows into Sisyphus. Is Sisyphus also moving to AWS? If Sisyphus does not also move, then the Socorro data must cross the boundary out of AWS and may incur $$ costs.
We'd like to audit exactly what data Sisyphus uses from the Socorro CSV files and, perhaps, devise a way for Sisyphus to use the data directly from Socorro. Alternatively, perhaps Socorro could inject data directly into the Sisyphus database.
Comment 1•10 years ago
|
||
The fields that are definitely in use are:
signature
url
product
version
branch
os_name
os_version
cpu_info
I believe the url is the only sensitive data item.
It would be nice to be able to get the exploitability rating so I could prioritize testing those first.
I would prefer to pull the data from Socorro given a date range, usually just a day. Sisyphus/Bughunter can't handle the full volume of crashes on a daily basis. I normally load them as needed when the current set of crash urls have been processed.
Comment 2•10 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #1)
> The fields that are definitely in use are:
>
> signature
> url
> product
> version
> branch
> os_name
> os_version
> cpu_info
>
> I believe the url is the only sensitive data item.
>
> It would be nice to be able to get the exploitability rating so I could
> prioritize testing those first.
>
> I would prefer to pull the data from Socorro given a date range, usually
> just a day. Sisyphus/Bughunter can't handle the full volume of crashes on a
> daily basis. I normally load them as needed when the current set of crash
> urls have been processed.
I believe that you could get all of this info now as JSON from our API:
https://crash-stats.mozilla.com/api
Peter, would you mind helping to figure out exactly what API call(s) we'd need to provide the above?
Bob, you'd need to have an account on crash-stats with PII access (you can generate a token for use with scripts/curl/etc), you could then get this data close to real-time rather than waiting for a daily CSV dump.
Flags: needinfo?(peterbe)
Comment 3•10 years ago
|
||
It would be helpful if the api allowed me to only get the crashes I will actually use in testing. I ignore crashes without urls, crashes on old versions, crashes on private urls, duplicates.
Not sure if I have PII but I can see urls and exploitability ratings when logged in.
To answer the question about where sisyphus/bughunter is going to live: as far as I know we are staying in PHX1 for now and will move to SCL3. The need for OSX, Windows, Linux; the need to update the workers to keep up with security patches; the need to be able to run under snapshots that can be rolled back and the sensitive nature of the urls and crashes gives me pause when thinking of moving it out of house.
I think the actual volume of data to be moved will be much smaller than you might expect from the daily crash dumps: 1) I won't be needing new data every day; more like once a week. 2) The overall size will be much smaller.
Comment 4•10 years ago
|
||
I'm going to throw the ball over to Adrian who is our resident master of SuperSearch.
The endpoint to use is https://crash-stats.mozilla.com/api/#SuperSearchUnredacted which *requires* that you have the following two permissions under your name:
* View Exploitability Results
* View Personal Identifiable Information
If it doesn't appear on https://crash-stats.mozilla.com/api/ for you Bob, you'll have to file bugs against Socorro to have those permissions added to your name.
One crux is that the SuperSearchUnredacted currently returns EVERYTHING but the feature to be able to specify exact fields is in the works and will be available in a couple of weeks.
Flags: needinfo?(peterbe) → needinfo?(adrian)
Comment 5•10 years ago
|
||
Actually, the truth is that the API will not work until we have the ability to specify which columns should come back. If you get all the fields you risk making such an enormous JSON blob returned that it's going to make our server crawl to a halt.
Adrian, can we depend this bug on the one you're working on for the columns?
Comment 6•10 years ago
|
||
Peter answered the question very well. You can use the `date` parameter to get only the day you want, and use any other param to filter the results even more. You'll also need to use `_results_offset` and `_results_number` to get more data, by default it returns only the first 100 results.
I'd even recommend that you use the UI we have [0] to make the initial search, then click the "More options" link under the search form and copy the link to the public API it gives you. You'll just need to replace `SuperSearch` with `SuperSearchUnredacted` to get all the data you need.
Here's an example API call: https://crash-stats.mozilla.com/api/SuperSearch/?date=%3E%3D2015-05-06T00%3A00%3A00&date=%3C2015-05-07T00%3A00%3A00
[0] https://crash-stats.mozilla.com/search/
Depends on: 1144569
Flags: needinfo?(adrian)
Comment 7•9 years ago
|
||
We've just pushed to AWS - do you need any help moving over to the API for this, or do you need us to continue producing a CSV?
We're not going to be able to push to servers in Mozilla's datacenters though unfortunately.
No longer blocks: 1118288
Flags: needinfo?(bob)
Comment 8•9 years ago
|
||
I have one from yesterday which I'll load as soon as the current set completes. That will hold me over for several days at least. I haven't had a chance to look at this while I've been working on Autophone issues, but should be able to get something together soon. If I have problems, I'll ask for help. No need to keep producing the csv file any more.
Flags: needinfo?(bob)
Comment 9•9 years ago
|
||
Lars, bsmedberg asked that we re-enabled the CSV job since folks in the graphics teams and external users are using it. Can we come up with a better way to produce this so you can land your refactoring patch (I know the current method uses a bunch of old code we'd like to get rid of?)
Flags: needinfo?(lars)
Summary: rework Sisyphus data source to support Socorro's new home in AWS → disabled daily CSV cron job
Updated•9 years ago
|
Summary: disabled daily CSV cron job → disable daily CSV cron job
Comment 10•9 years ago
|
||
I believe I can use the SuperSearchUnredacted to get what I need. I am going to leave this bug open for any work you may need to complete, but will be performing the necessary changes to Bughunter in Bug 1185498
Comment 11•9 years ago
|
||
fyi, Bughunter is now using the Socorro search api. Thanks all.
Comment 12•9 years ago
|
||
Bughunter is loading urls directly from Socorro (bug 1192646) and no longer uses the daily csv dump files. Is there anything else that needs to be done here to support others' uses or can we mark this fixed?
Updated•9 years ago
|
Flags: needinfo?(lars)
Comment 13•8 years ago
|
||
-> fixed as far as I know.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•