Closed
Bug 1126891
Opened 9 years ago
Closed 9 years ago
Analyze top sites experiment data with Interest Dashboard classifier
Categories
(Content Services Graveyard :: Classification Engine, defect)
Content Services Graveyard
Classification Engine
Tracking
(Not tracked)
RESOLVED
FIXED
Iteration:
38.3 - 23 Feb
People
(Reporter: Mardak, Assigned: mzhilyaev)
References
Details
(Whiteboard: .003)
Let's take the subdomain impressions from bug 1062708 and calculate some classification coverage precision/recall. We'll probably have at least 2 sets of numbers: one for just unique subdomain coverage and another weighted by the number of impressions per subdomain.
Assignee | ||
Comment 1•9 years ago
|
||
Resetting to next iteration as i am currently working on sites co-occurrence data infernyx rules and analytics. If there's a pressing business need to get this data soon, I would like to look at it next iteration
Iteration: 38.2 - 9 Feb → 38.3 - 23 Feb
Points: --- → 13
Assignee | ||
Comment 2•9 years ago
|
||
site_stats_daily need to be rebuilt in order to run classification currently there only 20 sites in this table: psql (9.3.1, server 8.0.2) Type "help" for help. tiles=> select count(distinct(url)) from site_stats_daily where url != ''; count ------- 20 (1 row)
Assignee | ||
Comment 3•9 years ago
|
||
Only 6% of all the tile urls were classified by UP classification we use in ID. The list of categorized sites here: https://people.mozilla.org/~mzhilyaev/tiles/en-US.US.sites_categorized Statistics per UP category is here: https://people.mozilla.org/~mzhilyaev/tiles/en-US.US.cats_stats UP classifier will perform poorly on just domains or hosts. Also note that site impressions can not be simply added to get impression count for a category: same sports sites may occur on the same new-tab page. We need better strategy for audience sizing and do to so we need to understand audience segmentation. I am closing this bug as it's formally fixed, but we MUST talk about how we segment and size the audience. This deserves a separate story bug in my opinion
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 4•9 years ago
|
||
What's the coverage weighted by impression? E.g., espn.go.com has 30789 impressions and was categorized but some.random.site.com with 1 impression wasn't categorized.
Assignee | ||
Comment 5•9 years ago
|
||
These will be columns 3 and 5 of https://people.mozilla.org/~mzhilyaev/tiles/en-US.US.cats_stats category,sites,impressions,% of total sites,% of total impressions UNCATEGORIZED,38043,5191729,93.38,86.66 fashion,7,61,0.02,0.01 ... sports,395,111758,0.97,15.94 Number in third column is the sum of all site impressions falling into the category Number in fifth column is the % of category impressions sum over total impressions sum All uncategorized site were put into UNCATEGORIZED category, which covers 93.38% of all sites, and 86.66% of the impression sum of all site's impressions.
You need to log in
before you can comment on or make changes to this bug.
Description
•