Created attachment 8584843 [details] sim-sites.whitelist - contains white list extracted from similar sites data Format of the file: site | occurrences in sites listing | alexa rank
The file was generated by collecting all sites in similar-sites data and counting number of times each site was mentioned in any of the lists. Then alexa rank was assigned to a site if it was listed in alexa top-1m.csv, otherwise 9999999 was assigned. The sites where chosen if: - sim-sites occurrence is 2 or higher - or if alexa rank is below 200000 The resulting list is 50557 entries and covers 86% of EdRules (which is better then alexa's original white list). I would recommend using the attached white list over 50K of alexa top-1m.csv
Created attachment 8585786 [details] remove some junk domains Removed junk domains like: 4.cn 6.cn com. d.cn g.cn i.ua j.mp o.cn org. q.gs t.cn t.co u.tv w.cn
Attachment #8584843 - Attachment is obsolete: true
Per conversation with Mardak, closing this bug as white list seems to perform well, and it's difficult to improve it by adding more sites from Alexa or increasing selection rank for a single sim-site occurrence.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.