Closed
Bug 1108622
Opened 10 years ago
Closed 9 years ago
[back-end] Implement Folding Tests for URL classification on Moreover corpus
Categories
(Content Services Graveyard :: Classification Engine, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
Iteration:
38.1 - 26 Jan
People
(Reporter: mzhilyaev, Assigned: mzhilyaev)
References
Details
(Whiteboard: .?)
Need to test classification accuracy/recall on Moreover corpus by folding corpus into older and recent chunks. Whereby, the training occurs on older chunk and is applied to the recent chunk. If user history is incorporated into URL classification via model fitting (bug# 1104335) the testing should be extended to recency folding as well. However, a user model needs to be developed to synthesize history from sites popularity.
Assignee | ||
Updated•10 years ago
|
Points: --- → 13
Whiteboard: .?
Assignee | ||
Comment 1•10 years ago
|
||
We should have a general way to build rule set from a subset of corpus and apply it to yet another subset of corpus to allow for folding testing. However, there's currently no algorithmic way to compute rule set - all manual updates to Matthew payload. The bug# 1104364 is filed for automation of rule generation.
Assignee | ||
Updated•9 years ago
|
Summary: [back-end] Testing URL classification on Moreover corpus → [back-end] Implement Folding Tests for URL classification on Moreover corpus
Updated•9 years ago
|
Iteration: 37.3 - 12 Jan → 38.1 - 26 Jan
Assignee | ||
Comment 2•9 years ago
|
||
This generic way was implemented by allowing DFR generation and Rule generation be done on specific chunk of the corpus. Both scripts allow -f date and -t date arguments to enable selection by date range.
Assignee: nobody → mzhilyaev
Assignee | ||
Comment 3•9 years ago
|
||
./generateCorpusStats.js -h USAGE: generateCorpusStats.js [OPTIONS] Generates Corpus URL and Title stats -h, --help display this help -v, --verbous display debug info -d, --dbHost=ARG db hosts: default=localhost -p, --dbPort=ARG db port: default=27017 -f, --fromDate=ARG starting from date in yyyy/mm/dd format (like 2014/10/01): default none -t, --toDate=ARG ending from date in yyyy/mm/dd format (like 2014/10/01): default none -l, --limit=ARG docs limit: default none ./generateDFRStats.js -h USAGE: generateDFRStats.js [OPTIONS] [DFR FILES] Generates DFR matching stats -h, --help display this help -v, --verbous display debug info -d, --dbHost=ARG db hosts: default=localhost -p, --dbPort=ARG db port: default=27017 -f, --fromDate=ARG starting from date in yyyy/mm/dd format (like 2014/10/01): default none -t, --toDate=ARG ending from date in yyyy/mm/dd format (like 2014/10/01): default none -l, --limit=ARG docs limit: default none
Comment 4•9 years ago
|
||
Commit pushed to master at https://github.com/mzhilyaev/pfeed https://github.com/mzhilyaev/pfeed/commit/5318f4805ae2d2894f0a238fb54d336b29202dd3 Closes Bug 1108622 - Fix missing from/to search params
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•