Closed Bug 1104322 Opened 10 years ago Closed 1 month ago

UP Research Meta-Bug

Categories

(Content Services Graveyard :: Classification Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: mzhilyaev, Assigned: mruttley)

References

Details

(Whiteboard: .?)

This is an umbrella bug for corpus analytics related to heuristics rules extraction.

Currently the rules are extracted as follows:
- moreover topics are mapped to IAB cats
- domain, subdomain and path rules are computed for docs with non-empty IAB categories
- rules are sorted by precision
- then rules are selected via manual process

There are a number of issues with this approach
- title and url keywords are not subjected to analysis
- only 60% of the content is categorized
- many urls are from RSS feeds, hence their canonical form is unknown

There are problems with IAB taxonomy as well
- some nodes are low populated like automotive/diesel
- some important node do not exists like video games
- some categories are overpopulated and non-specific like technology or entertainment

Since we are likely to be in need to provide finer and more balanced interest categorization, we may need to retrain classifiers for exiting (IAB) and extended taxonomies.
Component: Interest Dashboard → Classification Engine
Depends on: 1104364
Depends on: 1104367
Whiteboard: .?
Depends on: 1104329
Depends on: 1104335
Depends on: 1104376
Summary: [back-end] Heuristics re-computation process → [back-end] Heuristics re-computation process (placeholder for back-end classification support)
Depends on: 1108622
Depends on: 1108626
No longer depends on: 1104364
No longer depends on: 1108622
Depends on: 1122629
Assignee: nobody → mruttley
Depends on: 1109305
Summary: [back-end] Heuristics re-computation process (placeholder for back-end classification support) → UP Research Meta-Bug
Depends on: 1127894
Blocks: 1189757
Depends on: 1202777
Depends on: 1202779
Depends on: 1202806
Depends on: 1202807
Depends on: 1202809
Depends on: 1202812
Depends on: 1202813
Depends on: 1202814
Depends on: 1202815
Status: NEW → RESOLVED
Closed: 1 month ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.