Closed Bug 1104322 Opened 10 years ago Closed 1 month ago

UP Research Meta-Bug

Tracking

(Not tracked)

Status:

RESOLVED INCOMPLETE

People

(Reporter: mzhilyaev, Assigned: mruttley)

References

Details

(Whiteboard: .?)

maxim zhilyaev

Reporter

Description

•

10 years ago

This is an umbrella bug for corpus analytics related to heuristics rules extraction.

Currently the rules are extracted as follows:
- moreover topics are mapped to IAB cats
- domain, subdomain and path rules are computed for docs with non-empty IAB categories
- rules are sorted by precision
- then rules are selected via manual process

There are a number of issues with this approach
- title and url keywords are not subjected to analysis
- only 60% of the content is categorized
- many urls are from RSS feeds, hence their canonical form is unknown

There are problems with IAB taxonomy as well
- some nodes are low populated like automotive/diesel
- some important node do not exists like video games
- some categories are overpopulated and non-specific like technology or entertainment

Since we are likely to be in need to provide finer and more balanced interest categorization, we may need to retrain classifiers for exiting (IAB) and extended taxonomies.

maxim zhilyaev

Reporter

Updated

•

10 years ago

Component: Interest Dashboard → Classification Engine

maxim zhilyaev

Reporter

Updated

•

10 years ago

Depends on: 1104364

maxim zhilyaev

Reporter

Updated

•

10 years ago

Depends on: 1104367

maxim zhilyaev

Reporter

Updated

•

10 years ago

Whiteboard: .?

maxim zhilyaev

Reporter

Updated

•

10 years ago

Depends on: 1104329

maxim zhilyaev

Reporter

Updated

•

10 years ago

Depends on: 1104335

maxim zhilyaev

Reporter

Updated

•

10 years ago

Depends on: 1104376

maxim zhilyaev

Reporter

Updated

•

10 years ago

Summary: [back-end] Heuristics re-computation process → [back-end] Heuristics re-computation process (placeholder for back-end classification support)

maxim zhilyaev

Reporter

Updated

•

10 years ago

Depends on: 1108622

maxim zhilyaev

Reporter

Updated

•

10 years ago

Depends on: 1108626

maxim zhilyaev

Reporter

Updated

•

9 years ago

No longer depends on: 1104364

maxim zhilyaev

Reporter

Updated

•

9 years ago

No longer depends on: 1108622

maxim zhilyaev

Reporter

Updated

•

9 years ago

Depends on: 1122629

Hermina

Updated

•

9 years ago

Assignee: nobody → mruttley

Matthew Ruttley [:mruttley]

Assignee

Updated

•

9 years ago

Depends on: 1109305

Matthew Ruttley [:mruttley]

Assignee

Updated

•

9 years ago

Summary: [back-end] Heuristics re-computation process (placeholder for back-end classification support) → UP Research Meta-Bug

Matthew Ruttley [:mruttley]

Assignee

Updated

•

9 years ago

Depends on: 1127894

Hermina

Updated

•

9 years ago

Blocks: 1189757

Hermina

Updated

•

9 years ago

Depends on: 1202777

Hermina

Updated

•

9 years ago

Depends on: 1202779

Hermina

Updated

•

9 years ago

Depends on: 1202806

Hermina

Updated

•

9 years ago

Depends on: 1202807

Hermina

Updated

•

9 years ago

Depends on: 1202809

Hermina

Updated

•

9 years ago

Depends on: 1202812

Hermina

Updated

•

9 years ago

Depends on: 1202813

Hermina

Updated

•

9 years ago

Depends on: 1202814

Hermina

Updated

•

9 years ago

Depends on: 1202815

u597032

Updated

•

1 month ago

Status: NEW → RESOLVED

Closed: 1 month ago

Resolution: --- → INCOMPLETE

You need to log in before you can comment on or make changes to this bug.