529946 - Query access to backend data

Reporter

Description

•

15 years ago

I like to do historical analysis on crash report data. Direct query access (SQL or whatever) to the backend would make it a lot easier. Currently I'm using Python scripts to drive the web interface and parse results, but that's clunky and inefficient.

Frank Griswold [:griswolf] [:fgriswold]

Comment 1

•

15 years ago

Some concern that long-running queries might impact the ability of the server to timely accept incoming data; and that allowing free-form queries would lead to such a state.

Would it be reasonable to add csv format from the web interface and use that output? I believe at least some pages already provide that.

Michael Morgan [:morgamic]

Updated

•

15 years ago

Assignee: nobody → server-ops

Component: Socorro → Server Operations

Product: Webtools → mozilla.org

QA Contact: socorro → mrz

Version: Trunk → other

Michael Morgan [:morgamic]

Comment 2

•

15 years ago

mrz - can we get dave access to this data somehow?  Aravind?

If it helps them to query the database directly via some read-only account, or give them a stage db that contains a copy, I'm all for it.  Possible?

matthew zeier [:mrz]

Comment 3

•

15 years ago

I share Frank's concern.  Is there a way to do this and cut off David if it becomes a problem?  

David, you'd need to know access will be cut off if it's service impacting (shoot first, ask questions later).

Aravind, what do you think?

Assignee: server-ops → aravind

David Mandelin [:dmandelin]

Reporter

Comment 4

•

15 years ago

(In reply to comment #3)
> David, you'd need to know access will be cut off if it's service impacting
> (shoot first, ask questions later).

Of course. Or maybe you could run my jobs at a lower priority if possible.

Btw, in the past I have done some historical bug analysis by writing scripts that query the web page, e.g., one query for each day in a 3-month time period. You guys never complained about that :-) so I assume it wasn't service impacting. What I'm really asking for in this bug is a way to do that kind of thing that is less clunky (some sort of query language or map-reduce expression vs. a patchwork of HTML parsing scripts) and hopefully faster because it's more direct. Ideally, it would be less perf-impacting overall but with same or better perf since I wouldn't be touching the web/HTML parts as much.

Aravind Gottipati [:aravind]

Comment 5

•

15 years ago

I think this would be better achieved through the python middleware layer we have deployed.

@lars: is this feasible, would it make sense to provide access like this to folks through the middleware layer?

Frank Griswold [:griswolf] [:fgriswold]

Comment 6

•

15 years ago

+1 on comment #5

Austin King [:ozten]

Comment 7

•

15 years ago

(In reply to comment #5)
Another thing the Postgres folks advocated was creating a special user/password. This will allow us to have different max memory, query timeout, etc for this privileged user to fine tune resource usage.

Aravind Gottipati [:aravind]

Updated

•

14 years ago

Assignee: aravind → nobody

Component: Server Operations → Socorro

Product: mozilla.org → Webtools

QA Contact: mrz → socorro

Laura Thomson :laura

Assignee

Comment 8

•

14 years ago

Let's hold this on rewriting the middleware (1.7) and review for 1.8.  Should be a generally useful feature.

Target Milestone: --- → 1.8

Laura Thomson :laura

Assignee

Comment 9

•

14 years ago

This will be a 2.x feature.

Target Milestone: 1.8 → 2.0

Laura Thomson :laura

Assignee

Comment 10

•

13 years ago

dmandelin: for right now, if you have a query file a bug for Metrics.  We'll likely solve this with ES ad hoc queries.

Target Milestone: 2.0 → 2.2

David Mandelin [:dmandelin]

Reporter

Comment 11

•

13 years ago

(In reply to comment #10)
> dmandelin: for right now, if you have a query file a bug for Metrics.  We'll
> likely solve this with ES ad hoc queries.

OK, thanks.

Laura Thomson :laura

Assignee

Updated

•

13 years ago

Target Milestone: 2.2 → 2.3

Robert Helmer [:rhelmer]

Updated

•

13 years ago

Target Milestone: 2.3 → 2.3.1

Robert Helmer [:rhelmer]

Comment 12

•

13 years ago

(In reply to David Mandelin from comment #0)
> I like to do historical analysis on crash report data. Direct query access
> (SQL or whatever) to the backend would make it a lot easier. Currently I'm
> using Python scripts to drive the web interface and parse results, but
> that's clunky and inefficient.

Just a note that we do drop a CSV file (which is a sanitized dump of the main reports table) nightly:
https://crash-analysis.mozilla.com/crash_analysis/20111005/20111005-pub-crashdata.csv.gz

Given the crash ID in that file, you can access individual reports like so:
https://crash-stats.mozilla.com/dumps/446e67d0-def4-43e7-8228-7d62e2110316.jsonz

Laura Thomson :laura

Assignee

Comment 13

•

13 years ago

We have plans for this, but not until early 2012.  De-targeting until then.

If you need something specific let us know.

Whiteboard: [1Q2012]

Target Milestone: 2.3.1 → ---

Laura Thomson :laura

Assignee

Updated

•

13 years ago

Whiteboard: [1Q2012] → [Q12012wanted]

Laura Thomson :laura

Assignee

Comment 14

•

13 years ago

Basic plans here are:
- Add a VM to run SQL against (some subset of? secondary?) PostgreSQL, depending on DBA advice
- Add a VM to run queries against HBase
- Open up our API internally for other people to run apps/scripts against

Assignee: nobody → laura

Laura Thomson :laura

Assignee

Updated

•

13 years ago

Target Milestone: --- → 2.4.2

Nobody; OK to take it and work on it

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

Laura Thomson :laura

Assignee

Comment 15

•

12 years ago

Re item 2, above:
Dave: You now have access to query HBase via Hive (looks like SQL, maps to MapReduce) and Pig (uses its own query language, called Pig Latin, maps to MapReduce).  Details coming in 
https://bugzilla.mozilla.org/show_bug.cgi?id=708452

Re item 1:
Do you also need access to run SQL against the secondary PostgreSQL instance?  If so I'll get a bug on file for that.  

Re item 3: We're planning on opening the API internally around the end of Q1.

Let me know if you have any questions.

Laura Thomson :laura

Assignee

Updated

•

12 years ago

Target Milestone: 2.4.2 → 2.4.3

Lonnen :lonnen

Updated

•

12 years ago

Target Milestone: 2.4.3 → 2.4.4

Laura Thomson :laura

Assignee

Comment 16

•

12 years ago

Done except for the API part, revisit in a month or so.

Target Milestone: 2.4.4 → 2.6

Laura Thomson :laura

Assignee

Updated

•

12 years ago

Target Milestone: 4 → 5

Laura Thomson :laura

Assignee

Updated

•

12 years ago

Whiteboard: [Q12012wanted] → [Q22012wanted]

Target Milestone: 5 → 9

Laura Thomson :laura

Assignee

Updated

•

12 years ago

Target Milestone: 9 → 13

Laura Thomson :laura

Assignee

Updated

•

12 years ago

Target Milestone: 13 → 14

Laura Thomson :laura

Assignee

Updated

•

12 years ago

Target Milestone: 14 → 16

Laura Thomson :laura

Assignee

Updated

•

12 years ago

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Matt Brandt [:mbrandt]

Updated

•

12 years ago

Whiteboard: [Q22012wanted] → [Q22012wanted][qa-]