Closed Bug 1036559 Opened 10 years ago Closed 10 years ago

Make ADI pull from HIVE also retrieve the BAMO entries

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kairo, Assigned: rhelmer)

References

Details

Attachments

(1 file)

Once Socorro is pulling ADI data from HIVE, we are ready to do the actual switch from addons.mozilla.org (AMO) to blocklist.addons.mozilla.org (BAMO) for the blocklist ping in our products.

My plan there is as follows:
1) Verify that Socorro is getting the correct AMO data from HIVE (matching the old ADI data push) - that's AFAIK underway now
2) Have Socorro pull HIVE data for BAMO as well as AMO <-- This bug!
3) Switch Nightly to direct its ping to BAMO instead of AMO (bug 1006615)
4) Verify that Socorro is still getting good data for Nightly
5) Put the 301 redirect in place to make all AMO pings go to BAMO (bug 1020320, currently scheduled for 7/14)
6) Switch Socorro to use BAMO data only (if the original pings to AMO that get redirected end up getting entries on both sides, we need to do that urgently with the switch, if not, this can be done lazily).

As indicated, the bug here covers step 2.

If everything works out well, I'd like to do step 3 before this weekend, so that we do step 4 (verification) on Monday and make green light for step 5 dependent on that.
CCing Jason for info as he's doing step 5 (redirect).

Rob, should another bug be filed on step 6?

Jason, BTW, do you have the info to clear up with brackets in my step 6 description?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #0)

> Jason, BTW, do you have the info to clear up with brackets in my step 6
> description?

Yes we will see entries on both sides. BLP requests will continue to be logged in the AMO netscaler logs but the HTTP Response will be 301 instead of 200.
Blocks: 1011648
(In reply to Jason Thomas [:jason] from comment #1)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #0)
> 
> > Jason, BTW, do you have the info to clear up with brackets in my step 6
> > description?
> 
> Yes we will see entries on both sides. BLP requests will continue to be
> logged in the AMO netscaler logs but the HTTP Response will be 301 instead
> of 200.

Hmm, should we make sure to only actually record data for HTTP response code 200? Do we know if we previously filtered for that when retrieving ADI?
(In reply to Robert Kaiser (:kairo@mozilla.com, slow reaction due to vacation backlog) from comment #0)
> 2) Have Socorro pull HIVE data for BAMO as well as AMO <-- This bug!

What needs to be done here - is this just a matter of changing the Hive SQL query?
The current query is:
https://github.com/mozilla/socorro/blob/master/socorro/cron/jobs/fetch_adi_from_hive.py#L40
Flags: needinfo?(tmeyarivan)
Flags: needinfo?(jthomas)
I think we need to update the query from 'domain=addons.mozilla.org' to 'domain=blocklist.addons.mozilla.org' if we don't want to include addons.mozilla.org 301's.
Flags: needinfo?(jthomas)
I want to note I haven't worked with HIVE at all before so comment 4 may be incorrect.
(In reply to Robert Helmer [:rhelmer] from comment #3)
> What needs to be done here - is this just a matter of changing the Hive SQL
> query?

Yes. Ideally we would switch in a way that we do not actually query for 301 responses (we may want to exclude any other errors as well so may want to only count 200s in the end).

(In reply to Jason Thomas [:jason] from comment #4)
> I think we need to update the query from 'domain=addons.mozilla.org' to
> 'domain=blocklist.addons.mozilla.org' if we don't want to include
> addons.mozilla.org 301's.

If we'd do a complete change from one to the other, we'd need to exactly sync the change here with the switch to the 301. I do not really want us to rely on doing that.
(In reply to Robert Kaiser (:kairo@mozilla.com, slow reaction due to vacation backlog) from comment #6)
> (In reply to Robert Helmer [:rhelmer] from comment #3)
> > What needs to be done here - is this just a matter of changing the Hive SQL
> > query?
> 
> Yes. Ideally we would switch in a way that we do not actually query for 301
> responses (we may want to exclude any other errors as well so may want to
> only count 200s in the end).


Isn't this up to whatever tool is doing the parsing of the logs? If it's possible to get the HTTP status code as part of the query, I could exclude those.


> (In reply to Jason Thomas [:jason] from comment #4)
> > I think we need to update the query from 'domain=addons.mozilla.org' to
> > 'domain=blocklist.addons.mozilla.org' if we don't want to include
> > addons.mozilla.org 301's.
> 
> If we'd do a complete change from one to the other, we'd need to exactly
> sync the change here with the switch to the 301. I do not really want us to
> rely on doing that.


Yes I'd prefer something we can transition over to more smoothly - if I can get the HTTP status code in the query and do something like:

WHERE domain='addons.mozilla.org' OR domain='blocklist.addons.mozilla.org'

this would be probably be ideal.
Sheeri, please see comment 3 and the responses. Do you know what we need to be querying from Hive to change over from AMO to BAMO?
Flags: needinfo?(scabral)
(In reply to Robert Helmer [:rhelmer] from comment #7)
> Yes I'd prefer something we can transition over to more smoothly - if I can
> get the HTTP status code in the query and do something like:
> 
> WHERE domain='addons.mozilla.org' OR domain='blocklist.addons.mozilla.org'
> 
> this would be probably be ideal.

HTTP status code is available via column 'http_status_code' - re querying both domains, ".... (domain=A or domain = B) AND ...." should be sufficient

--


--
Flags: needinfo?(tmeyarivan)
(In reply to T [:tmary] Meyarivan from comment #9)
> (In reply to Robert Helmer [:rhelmer] from comment #7)
> > Yes I'd prefer something we can transition over to more smoothly - if I can
> > get the HTTP status code in the query and do something like:
> > 
> > WHERE domain='addons.mozilla.org' OR domain='blocklist.addons.mozilla.org'
> > 
> > this would be probably be ideal.
> 
> HTTP status code is available via column 'http_status_code' - re querying
> both domains, ".... (domain=A or domain = B) AND ...." should be sufficient
> 
> --
> 
> 
> --

This looks perfect, thanks!
Flags: needinfo?(scabral)
Hey tmary, do you mind reviewing this change? I *think* this will be safe to start using right away, then we will be ready for the BAMO transition.
Attachment #8472725 - Flags: review?(tmeyarivan)
(In reply to Robert Helmer [:rhelmer] from comment #11)
> Created attachment 8472725 [details] [review]
> modify hive query to pull BAMO and AMO, for only HTTP 200 results
> 
> Hey tmary, do you mind reviewing this change? I *think* this will be safe to
> start using right away, then we will be ready for the BAMO transition.

+1

--
(In reply to Robert Helmer [:rhelmer] from comment #11)
> Created attachment 8472725 [details] [review]
> modify hive query to pull BAMO and AMO, for only HTTP 200 results
> 
> Hey tmary, do you mind reviewing this change? I *think* this will be safe to
> start using right away, then we will be ready for the BAMO transition.

tmary, per IRC I tested with/without this domain and results look reasonable.. I see nightly and nightly-ux Firefox and Thunderbird when filtering on BAMO only, older build IDs e.g. 20140519030202 which is what I'd expect.

KaiRo, does that make sense ^? Also this BAMO->AMO change would land on Nightly and ride the trains, so this strategy of pulling only HTTP 200s for both domains should allow us to transition, AFAICT. What do you think?
Cool - will push to land this before we go live on production then.
Status: NEW → ASSIGNED
KaiRo - please see comment 11 (which I meant to needinfo you on)
Flags: needinfo?(kairo)
Commits pushed to master at https://github.com/mozilla/socorro

https://github.com/mozilla/socorro/commit/25b0b714fbdc7ce4ee7163a47b92ae3f9fcef9f9
fix bug 1036559 - pull BAMO as well as AMO pings

https://github.com/mozilla/socorro/commit/13d369b528f530990028fb87e38cdd1ef2d82b39
Merge pull request #2287 from rhelmer/bug1036559-pull-from-bamo-and-amo

fix bug 1036559 - pull BAMO as well as AMO pings
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Landing this now so it can hit stage sooner - still would appreciate and additional info.
(In reply to Robert Helmer [:rhelmer] from comment #13)
> I see nightly and nightly-ux Firefox and Thunderbird when
> filtering on BAMO only, older build IDs e.g. 20140519030202 which is what
> I'd expect.

Yes, also matches what I expect, due to the early experiments we did with Nightly back then.

> KaiRo, does that make sense ^? Also this BAMO->AMO change would land on
> Nightly and ride the trains, so this strategy of pulling only HTTP 200s for
> both domains should allow us to transition, AFAICT. What do you think?

Yes, sounds good, that's the ideal situation for this bug (we can file a followup later to remove the AMO-only part but that can be done lazily when everything works fine with BAMO and the 301 and stuff.

What I see in the patch also looks good to me.
Flags: needinfo?(kairo)
The query changes made, that tmary verified, and you say make sense, are in fact the proper changes (from my own admittedly limited knowledge).
Attachment #8472725 - Flags: review?(tmeyarivan) → review+
Target Milestone: --- → 99
So, looks like both steps 1 and 2 from comment #0 are completely now, so I can move on to 3 and on! \o/
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: