Closed
Bug 1148474
Opened 9 years ago
Closed 9 years ago
Generate white list from similar sites data and compare with alexa top-1m list
Categories
(Content Services Graveyard :: Tiles, defect, P1)
Content Services Graveyard
Tiles
Tracking
(Not tracked)
RESOLVED
FIXED
Iteration:
39.3 - 30 Mar
People
(Reporter: mzhilyaev, Assigned: mzhilyaev)
References
Details
(Whiteboard: .?)
Attachments
(1 file, 1 obsolete file)
704.46 KB,
text/plain
|
Details |
No description provided.
Assignee | ||
Comment 1•9 years ago
|
||
Format of the file: site | occurrences in sites listing | alexa rank
Assignee | ||
Comment 2•9 years ago
|
||
The file was generated by collecting all sites in similar-sites data and counting number of times each site was mentioned in any of the lists. Then alexa rank was assigned to a site if it was listed in alexa top-1m.csv, otherwise 9999999 was assigned. The sites where chosen if: - sim-sites occurrence is 2 or higher - or if alexa rank is below 200000 The resulting list is 50557 entries and covers 86% of EdRules (which is better then alexa's original white list). I would recommend using the attached white list over 50K of alexa top-1m.csv
Assignee | ||
Comment 3•9 years ago
|
||
Removed junk domains like: 4.cn 6.cn com. d.cn g.cn i.ua j.mp o.cn org. q.gs t.cn t.co u.tv w.cn
Attachment #8584843 -
Attachment is obsolete: true
Assignee | ||
Comment 4•9 years ago
|
||
Per conversation with Mardak, closing this bug as white list seems to perform well, and it's difficult to improve it by adding more sites from Alexa or increasing selection rank for a single sim-site occurrence.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•