Query access to backend data

RESOLVED FIXED in 16

Status

Socorro
General
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: dmandelin, Assigned: laura)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Q22012wanted][qa-])

(Reporter)

Description

8 years ago
I like to do historical analysis on crash report data. Direct query access (SQL or whatever) to the backend would make it a lot easier. Currently I'm using Python scripts to drive the web interface and parse results, but that's clunky and inefficient.
Some concern that long-running queries might impact the ability of the server to timely accept incoming data; and that allowing free-form queries would lead to such a state.

Would it be reasonable to add csv format from the web interface and use that output? I believe at least some pages already provide that.
Assignee: nobody → server-ops
Component: Socorro → Server Operations
Product: Webtools → mozilla.org
QA Contact: socorro → mrz
Version: Trunk → other
mrz - can we get dave access to this data somehow?  Aravind?

If it helps them to query the database directly via some read-only account, or give them a stage db that contains a copy, I'm all for it.  Possible?

Comment 3

8 years ago
I share Frank's concern.  Is there a way to do this and cut off David if it becomes a problem?  

David, you'd need to know access will be cut off if it's service impacting (shoot first, ask questions later).

Aravind, what do you think?
Assignee: server-ops → aravind
(Reporter)

Comment 4

8 years ago
(In reply to comment #3)
> David, you'd need to know access will be cut off if it's service impacting
> (shoot first, ask questions later).

Of course. Or maybe you could run my jobs at a lower priority if possible.

Btw, in the past I have done some historical bug analysis by writing scripts that query the web page, e.g., one query for each day in a 3-month time period. You guys never complained about that :-) so I assume it wasn't service impacting. What I'm really asking for in this bug is a way to do that kind of thing that is less clunky (some sort of query language or map-reduce expression vs. a patchwork of HTML parsing scripts) and hopefully faster because it's more direct. Ideally, it would be less perf-impacting overall but with same or better perf since I wouldn't be touching the web/HTML parts as much.
I think this would be better achieved through the python middleware layer we have deployed.

@lars: is this feasible, would it make sense to provide access like this to folks through the middleware layer?

Comment 7

8 years ago
(In reply to comment #5)
Another thing the Postgres folks advocated was creating a special user/password. This will allow us to have different max memory, query timeout, etc for this privileged user to fine tune resource usage.
Assignee: aravind → nobody
Component: Server Operations → Socorro
Product: mozilla.org → Webtools
QA Contact: mrz → socorro
(Assignee)

Comment 8

8 years ago
Let's hold this on rewriting the middleware (1.7) and review for 1.8.  Should be a generally useful feature.
Target Milestone: --- → 1.8
(Assignee)

Comment 9

8 years ago
This will be a 2.x feature.
Target Milestone: 1.8 → 2.0
(Assignee)

Comment 10

7 years ago
dmandelin: for right now, if you have a query file a bug for Metrics.  We'll likely solve this with ES ad hoc queries.
Target Milestone: 2.0 → 2.2
(Reporter)

Comment 11

7 years ago
(In reply to comment #10)
> dmandelin: for right now, if you have a query file a bug for Metrics.  We'll
> likely solve this with ES ad hoc queries.

OK, thanks.
(Assignee)

Updated

7 years ago
Target Milestone: 2.2 → 2.3
Target Milestone: 2.3 → 2.3.1
(In reply to David Mandelin from comment #0)
> I like to do historical analysis on crash report data. Direct query access
> (SQL or whatever) to the backend would make it a lot easier. Currently I'm
> using Python scripts to drive the web interface and parse results, but
> that's clunky and inefficient.

Just a note that we do drop a CSV file (which is a sanitized dump of the main reports table) nightly:
https://crash-analysis.mozilla.com/crash_analysis/20111005/20111005-pub-crashdata.csv.gz

Given the crash ID in that file, you can access individual reports like so:
https://crash-stats.mozilla.com/dumps/446e67d0-def4-43e7-8228-7d62e2110316.jsonz
(Assignee)

Comment 13

7 years ago
We have plans for this, but not until early 2012.  De-targeting until then.

If you need something specific let us know.
Whiteboard: [1Q2012]
Target Milestone: 2.3.1 → ---
(Assignee)

Updated

6 years ago
Whiteboard: [1Q2012] → [Q12012wanted]
(Assignee)

Comment 14

6 years ago
Basic plans here are:
- Add a VM to run SQL against (some subset of? secondary?) PostgreSQL, depending on DBA advice
- Add a VM to run queries against HBase
- Open up our API internally for other people to run apps/scripts against
Assignee: nobody → laura
(Assignee)

Updated

6 years ago
Target Milestone: --- → 2.4.2
Component: Socorro → General
Product: Webtools → Socorro
(Assignee)

Comment 15

6 years ago
Re item 2, above:
Dave: You now have access to query HBase via Hive (looks like SQL, maps to MapReduce) and Pig (uses its own query language, called Pig Latin, maps to MapReduce).  Details coming in 
https://bugzilla.mozilla.org/show_bug.cgi?id=708452

Re item 1:
Do you also need access to run SQL against the secondary PostgreSQL instance?  If so I'll get a bug on file for that.  

Re item 3: We're planning on opening the API internally around the end of Q1.

Let me know if you have any questions.
(Assignee)

Updated

6 years ago
Target Milestone: 2.4.2 → 2.4.3

Updated

6 years ago
Target Milestone: 2.4.3 → 2.4.4
(Assignee)

Comment 16

6 years ago
Done except for the API part, revisit in a month or so.
Target Milestone: 2.4.4 → 2.6
(Assignee)

Updated

6 years ago
Target Milestone: 4 → 5
(Assignee)

Updated

6 years ago
Whiteboard: [Q12012wanted] → [Q22012wanted]
Target Milestone: 5 → 9
(Assignee)

Updated

6 years ago
Target Milestone: 9 → 13
(Assignee)

Updated

6 years ago
Target Milestone: 13 → 14
(Assignee)

Updated

6 years ago
Target Milestone: 14 → 16
(Assignee)

Updated

6 years ago
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED

Updated

6 years ago
Whiteboard: [Q22012wanted] → [Q22012wanted][qa-]
You need to log in before you can comment on or make changes to this bug.