Closed
Bug 1023176
Opened 10 years ago
Closed 10 years ago
[Baloo] Setup/configure Bagheera for Baloo
Categories
(Mozilla Metrics :: Data/Backend Reports, defect)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1024059
Unreviewed
People
(Reporter: pierros, Assigned: scabral)
Details
Hello we would like to setup an entrypoint in bagheera specific to Baloo so that sent payloads end up in HBASE.
https://data.mozilla.com/submit/baloo/
Thanks!
Reporter | ||
Updated•10 years ago
|
Group: metrics-private
Assignee | ||
Comment 1•10 years ago
|
||
Pierros - the data warehouse team will handle the intermediate database. We haven't identified a need for HBASE, right now the aggregations are simple enough that we're doing them in a MySQL database.
Does Bahgeera require HBASE? Is Bagheera required for Baloo? It seems like we could run the aggregations in a relational database, like MySQL or Postgres, and put Tableau in front of it, that way each functional/product area (coding, QA, etc) can come up with their own reports and drill-downs.
The idea is that this lessens the work on the Metrics folks, because we eliminate unnecessary aggregations. The Metrics folks will still have access to all the data, through the data warehouse, so there's no loss or difference. It also lessens the dependence on HBASE, which is overkill for the data we have, and adds the step to convert from a relational schema to JSON for HBASE.
It just seems like moving all the data to HBASE is overkill and adds a dependence that isn't needed.
Assignee: nobody → scabral
Assignee | ||
Comment 2•10 years ago
|
||
In discussion today with Pierros, Adam and David, and doing a follow-up with :tmary, it seems that the information should:
0) Submit to kafka using the schema at https://wiki.mozilla.org/Baloo/Schema/0.1
(To be done by Sheero for bugzilla data, Pierros (and Adam, etc) for the rest)
1) BI/DW team will write a consumer to read from kafka and store in the endpoint (to be done by tmary)
2) BI/DW team will aggregate the activity from *all* the submissions, including de-duping (to be done by sheeri)
3) BI/DW team will extract the result of the aggregations and submit to Adam using the schema at https://docs.google.com/document/d/16Sas-dbBzSftWqacYhFRojjXLCkAXu6h2XxbkfALG-Q/edit#heading=h.6b13hi21db4l
Do those steps sound right to you? the end result is Adam gets the data in the format he needs, and for now let's focus this on the data for the year ending Sunday, June 8th (I think that's the data point for June 2nd? or June 9th?)
Assignee | ||
Comment 3•10 years ago
|
||
:tmary has a consumer for #1, but it needs testing with areal data extract (list item 0).
I'm working on that (#0) from the MySQL data I have, although if Pierros and Adam have JSON for their parts that will be a good test too.
Updated•10 years ago
|
Summary: [Baloo] Setup bagheera production entry point for Baloo → [Baloo] Setup/configure Bagheera for Baloo
Comment 4•10 years ago
|
||
I've just shared Github data via: Bug 1010190
Assignee | ||
Comment 5•10 years ago
|
||
Adam,
I was able to parse the data in CSV format and get it into our midpoint data store for aggregations.
Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•