Closed
Bug 970406
Opened 11 years ago
Closed 11 years ago
Create crontabber job to pull ADI data, rather than receive a push from metrics
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
91
People
(Reporter: selenamarie, Assigned: selenamarie)
References
Details
(Whiteboard: [DB change])
Look into what it will take to pull data and populate our raw_adu tables.
To start:
* Document flow of data from load balancers to Vertica
* Document logic in ADI rollup
* Investigate ways of implementing a pull system for data
| Assignee | ||
Updated•11 years ago
|
Assignee: nobody → sdeckelmann
Comment 1•11 years ago
|
||
Note that this is rather urgent because we do not get any ADI for Nightly right now, as the ADI push does not include ADI from the new blocklist.amo systems which are already used exclusively for Nightly and will soon be used for everything.
Comment 2•11 years ago
|
||
The new ADI source is missing channel and build, and possibly other dimensions.
Comment 3•11 years ago
|
||
The metrics block list ping format is defined at https://intranet.mozilla.org/Metrics/Blocklist
Current BAMO log line:
115.42.79.246 blocklist.addons.mozilla.org - [18/May/2014:11:00:17
-0700] "GET /blocklist/3/{ec8030f7-c20a-464f-9b0e-13a3a9e97384}/32.0a1/
HTTP/1.1" 200 117560 "-" "Mozilla/5.0 (Windows NT 5.1; rv:32.0)
Gecko/20100101 Firefox/32.0" "-" "DNT:-"
Old/valid log line:
raw 114.143.68.17 addons.mozilla.org null
18/May/2014:10:16:07 +0000 GET
/blocklist/3/%7Bec8030f7-c20a-464f-9b0e-13a3a9e97384%7D/29.0.1/Firefox/20140506152807/WINNT_x86-msvc/en-US/release/Windows_NT%205.1/default/default/6/215/1/
200 117405 - Mozilla/5.0 (Windows NT 5.1; rv:29.0)
Gecko/20100101 Firefox/29.0
__utmx=150903082.NxsqlQfBRSSBvkbX6YJy7w$71153379-26:1;
__utmxx=150903082.NxsqlQfBRSSBvkbX6YJy7w$71153379-26:1375365532:15552000;
__utma=150903082.1542016220.1375365555.1375365555.1375365555.1
DNT:-
The current BAMO format is missing quite a few parameters. We won't be able to decipher channel name and other information based on the current BAMO ping.
Comment 4•11 years ago
|
||
(In reply to Anurag Phadke[:aphadke@mozilla.com] from comment #3)
> The current BAMO format is missing quite a few parameters. We won't be able
> to decipher channel name and other information based on the current BAMO
> ping.
Hrm, that sounds like a problem in logging, then, we send the exact same ping with the same fields and length to BAMO than what we did send to AMO.
That sucks, I guess I need to revert the bug 1006615 patches as we need to get decent data again by the start of next week.
Comment 5•11 years ago
|
||
OK, I backed out the switch to BAMO and I will not land it again before we can't get that data into Socorro in some way.
Note that even with this, the issue is urgent, as we know we are missing ADI and have unstable data with the "old" AMO systems serving the blocklist and ADI data.
Blocks: 1006615
Comment 6•11 years ago
|
||
FWIW, with some help of https://wiki.mozilla.org/QA/Plugins/CTP_Blocklist#Instructions and the network tab in the Browser Toolbox of devtools, I can verify that my local build with the patch calls the following URL:
https://blocklist.addons.mozilla.org/blocklist/3/%7Bec8030f7-c20a-464f-9b0e-13a3a9e97384%7D/32.0a1/Firefox/20140520131030/Linux_x86_64-gcc3/en-US/default/Linux%203.14.0-2-desktop%20(GTK%202.24.23)/default/default/invalid/invalid/invalid/
So it doesn't seem client-side to me.
| Assignee | ||
Comment 7•11 years ago
|
||
> Blocks: 1020320
Can we get a document that explains how this export process is blocking the ability to move to blocklist.addons.org?
I'm confused about where data is going to and coming from and how the new process differs from the old with regard to the v2_raw_logs table.
What I want: two diagrams - one that shows the data flow from the old system into whatever the old tables are (that we are currently pulling ADI from) and a new diagram that shows the data flow in the new system (that will talk with blocklist.addons.org).
Thank you!
| Assignee | ||
Comment 8•11 years ago
|
||
progress!
(hive)[sdeckelmann@socorro1.dev.db.phx1 hive-test]$ python test-script.py > /pgdata/tmp/test_hive/full_output
QUERY START 06/06/14 09:24:08AM
QUERY COMPLETE 06/06/14 09:27:23AM
DONE 06/06/14 09:28:33AM
Moving this into a cronjob that can push data into postgres.
| Assignee | ||
Comment 9•11 years ago
|
||
(hive)[sdeckelmann@socorro1.dev.db.phx1 hive-test]$ python test-script.py > /pgdata/tmp/test_hive/full_output
QUERY START 06/06/14 01:09:58PM
QUERY COMPLETE 06/06/14 01:13:04PM
DONE 06/06/14 01:14:30PM
(hive)[sdeckelmann@socorro1.dev.db.phx1 hive-test]$ wc /pgdata/tmp/test_hive/full_output
188990 1699415 21978586 /pgdata/tmp/test_hive/full_output
(hive)[sdeckelmann@socorro1.dev.db.phx1 hive-test]$ time python load-data.py
real 0m2.274s
user 0m0.039s
sys 0m0.031s
So we're looking at it taking about 5-6 minutes to do a rollup of this data and get it into Postgres from Hive :)
Now looking into normalizing it for inserting into our raw_adu table.
| Assignee | ||
Comment 10•11 years ago
|
||
Side note: I investigated using StringIO instead of a temporary file. Unfortunately it added 10 minutes to the runtime vs using a temporary file. :/
QUERY START 10/06/14 02:41:09PM
QUERY COMPLETE 10/06/14 02:53:25PM
DONE 10/06/14 02:54:36PM
I'll set up two implementations of this in the event that we decide that's a reasonable tradeoff.
Comment 11•11 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/cc1c0679f21b3759d5982e21cf84125a989e6ea1
Fixes bug 970406 - request ADI/blocklist ping data from Hive daily
* adds new crontabber job: fetch_adi_from_hive
* adds pyhs2 dependency
* adds a new table for raw log aggregate data
https://github.com/mozilla/socorro/commit/b06802119b838124188523ab419547b5217aeff2
Migration for bug 970406
https://github.com/mozilla/socorro/commit/c69284ca88e82739cf400231c06f6e6c467926b4
Fixes bug 970406 r=peterbe
https://github.com/mozilla/socorro/commit/2c0e7a7cac8746b897abad2c023ff49e45863102
Merge pull request #2102 from selenamarie/bug970406-pull-adi-crontabber
Fixes bug 970406 - pull ADI/blocklist pings from Hive with crontabber
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
| Assignee | ||
Updated•11 years ago
|
Target Milestone: --- → 90
| Assignee | ||
Updated•11 years ago
|
Whiteboard: [DB change]
Comment 12•11 years ago
|
||
Commit pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/12b6eb545967a610697df516f0820175a530ab4f
Fixes bug 970406 - rebase for migration branch
Updated•11 years ago
|
Target Milestone: 90 → 91
| Assignee | ||
Updated•11 years ago
|
| Assignee | ||
Updated•11 years ago
|
Summary: Update our ADI process to pull data, rather than receive a push from metrics → Create crontabber job to pull ADI data, rather than receive a push from metrics
You need to log in
before you can comment on or make changes to this bug.
Description
•