Closed
Bug 1132660
Opened 10 years ago
Closed 9 years ago
Change to a lua geoip lib based on libmaxminddb
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: mreid, Assigned: kparlante)
References
Details
(Whiteboard: [unifiedTelemetry][40b9][data-validation])
The https://github.com/agladysh/lua-geoip library uses the old-style ".dat" geo database. We should use a library that can read the GeoIP2 ".mmdb" style database.
Reporter | ||
Updated•10 years ago
|
Priority: -- → P3
Reporter | ||
Comment 1•10 years ago
|
||
Bumping up to P2 because I noticed some of my own submissions are showing up with geoCountry == "??"
Priority: P3 → P2
Reporter | ||
Comment 2•10 years ago
|
||
Brendan,
Could you possibly take a look at the geoCountry info for FHRv2 vs. Unified? In particular, I'm interested in cases where:
- Unified geolocation differs from FHR, but both are set to specific country codes.
- Unified geo is "unknown" (denoted by "??"), while FHR is known.
- Ratio of "unknown" to "total" in FHR.
If you don't have bandwidth to take a look, please let me know. Thanks!
Flags: needinfo?(bcolloran)
I am unlikely to get to this question in the near future. It's now on my radar as something that needs to be looked at, but if someone else can do the looking that'd be great!
Flags: needinfo?(bcolloran)
Reporter | ||
Comment 4•10 years ago
|
||
Sheeri, do you know what the FHRv2 data looks like from the perspective of Comment 2?
Flags: needinfo?(scabral)
Comment 5•10 years ago
|
||
Sorry it took a while to get to this.
From what I can gather, the FHR v2 stuff is in tables named:
fhr_rollups_daily_base
fhr_rollups_weekly_base
fhr_rollups_monly_base
We have other tables, but that data stopped being collected in March, so yell if that's more like what you expect.
That being said, here's what the data looks like with a sample entry from 6/1/2015 - the format is field name:sample data. I think what you're looking for is the 8th column down - "geo"....
vendor:Ebon
name:Ebon
channel:default
os:WINNT
osdetail:win7
distribution:
locale:en-US
geo:EU
version:34.0.8.8
isstdprofile:FALSE
stdchannel:other
stdos:Windows
distribtype:mozilla
snapshot:20150601
granularity:day
timeStart:2015-03-05
timeEnd:2015-03-05
tTotalProfiles:0
tExistingProfiles:0
tNewProfiles:0
tActiveProfiles:0
tInActiveProfiles:1
tActiveDays:0
tTotalSeconds:0
tActiveSeconds:0
tNumSessions:0
tCrashes:0
tTotalSearch:0
tGoogleSearch:0
tYahooSearch:0
tBingSearch:0
tOfficialSearch:0
tIsDefault:0
tIsActiveProfileDefault:0
t5outOf7:0
tChurned:0
tHasUP:0
Here's the top 5 geos from 6/1:
dbadmin=> select count(*),geo from fhr_rollups_daily_base where snapshot='20150601' group by geo order by 1 DESC limit 5;
count | geo
-------+-----
53900 | US
39354 | DE
31423 | GB
26551 | FR
26215 | ES
(5 rows)
Flags: needinfo?(scabral)
Reporter | ||
Updated•10 years ago
|
Summary: Change to a lua geoip lib based on libmaxminddb → Make sure that geoIP lookups are working correctly
Whiteboard: [unifiedTelemetry][b5]
Assignee | ||
Updated•10 years ago
|
Whiteboard: [unifiedTelemetry][b5] → [unifiedTelemetry][b5][data-validation]
Updated•10 years ago
|
Assignee: nobody → kparlante
Updated•10 years ago
|
Whiteboard: [unifiedTelemetry][b5][data-validation] → [unifiedTelemetry][40b9][data-validation]
Updated•10 years ago
|
Iteration: --- → 42.3 - Aug 10
Assignee | ||
Updated•10 years ago
|
Iteration: 42.3 - Aug 10 → 43.1 - Aug 24
Assignee | ||
Comment 6•9 years ago
|
||
Updates on this bug:
- We do see some v2/v4 discrepancies when we look at an individual clientId [1]
- Bagheera/v2 and heka/v4 are using the same old-style .dat database [2]
- Both update regularly (heka/v4 updates daily), but they might update at a slightly different cadence
- whd found problems [2] with the current lib [2] while load testing, and as part of that work is recommending that we WONTFIX this bug
[1] See Appendix C https://docs.google.com/document/d/1XLaW7lq-dL6bcd7dixsk2K5F8TSgSrjG5Oy4kzweWLs/edit
[2] https://github.com/mozilla-services/data-pipeline/issues/115
[3] https://github.com/agladysh/lua-geoip
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Comment 7•9 years ago
|
||
I don't understand what the resolution of this bug means. Are you saying that geoip itself is WONTFIX, or that proper testing geoip is WONTFIX? Or is this bug tracking something other than data validation of the geoip bits?
Updated•9 years ago
|
Flags: needinfo?(kparlante)
Assignee | ||
Comment 8•9 years ago
|
||
The original bug was a question about whether or not we needed to upgrade MaxMind's GeoIP Legacy (old-style .dat) to MaxMind's GeoIP2 database (.mmdb). This was the actionable decision to make.
Summary of differences: http://dev.maxmind.com/geoip/geoip2/whats-new-in-geoip2/, they have the same accuracy: https://support.maxmind.com/ (see GeoIP FAQ). FWIW, we could upgrade to the precision database if we wanted improved accuracy at the city level.
The bug morphed into comparing to v2, questioning if the "legacy" format was less accurate than whatever v2 was using.
After comparing the data, looking at what v2 was actually using (exact same database), researching the differences between the formats, and checking the frequency with which we update the database (daily, updating to the paid, most accurate version), we have no reason to believe we are less accurate than v2, or that we have any problems with accuracy.
The exploration we embarked upon is done, and we're proposing we WONTFIX the change to the GeoIP mmdb.
Flags: needinfo?(kparlante)
Summary: Make sure that geoIP lookups are working correctly → Change to a lua geoip lib based on libmaxminddb
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•