Closed
Bug 1104322
Opened 10 years ago
Closed 1 month ago
UP Research Meta-Bug
Categories
(Content Services Graveyard :: Classification Engine, defect)
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: mzhilyaev, Assigned: mruttley)
References
Details
(Whiteboard: .?)
This is an umbrella bug for corpus analytics related to heuristics rules extraction. Currently the rules are extracted as follows: - moreover topics are mapped to IAB cats - domain, subdomain and path rules are computed for docs with non-empty IAB categories - rules are sorted by precision - then rules are selected via manual process There are a number of issues with this approach - title and url keywords are not subjected to analysis - only 60% of the content is categorized - many urls are from RSS feeds, hence their canonical form is unknown There are problems with IAB taxonomy as well - some nodes are low populated like automotive/diesel - some important node do not exists like video games - some categories are overpopulated and non-specific like technology or entertainment Since we are likely to be in need to provide finer and more balanced interest categorization, we may need to retrain classifiers for exiting (IAB) and extended taxonomies.
Reporter | ||
Updated•10 years ago
|
Component: Interest Dashboard → Classification Engine
Reporter | ||
Updated•10 years ago
|
Whiteboard: .?
Reporter | ||
Updated•10 years ago
|
Summary: [back-end] Heuristics re-computation process → [back-end] Heuristics re-computation process (placeholder for back-end classification support)
Assignee | ||
Updated•9 years ago
|
Summary: [back-end] Heuristics re-computation process (placeholder for back-end classification support) → UP Research Meta-Bug
You need to log in
before you can comment on or make changes to this bug.
Description
•