Closed Bug 1169366 Opened 5 years ago Closed 5 years ago

Update data file with new allowed frecent sites list

Categories

(Firefox :: New Tab Page, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 41
Iteration:
41.3 - Jun 29
Tracking Status
firefox40 --- fixed
firefox41 --- fixed

People

(Reporter: Mardak, Assigned: emtwo)

References

(Blocks 1 open bug)

Details

(Whiteboard: .001)

Attachments

(7 files, 7 obsolete files)

78.39 KB, patch
Mardak
: review+
Details | Diff | Splinter Review
51.71 KB, application/json
Details
118.03 KB, application/json
Details
29.79 KB, image/png
Details
29.47 KB, image/png
Details
75.09 KB, patch
Details | Diff | Splinter Review
4.37 KB, patch
Details | Diff | Splinter Review
We have ~60 new frecent sites lists to add for suggested tiles launch bug 1140185.

We'll also tweak some existing categories, e.g., adding techcrunch to technology.

This is currently waiting on finalization of the list and measuring the privacy aspects of each grouping.
Iteration: --- → 41.2 - Jun 8
Do we have a user facing name that can go with each of these? Firefox will fall back to showing the hardcoded name if the server doesn't provide an explanation text, e.g.,

Suggested for Business/ Finance_Investing News visitors
If we're using "A/n [category] site suggestion" the max character is 24 if we want to keep it at one line. One line looks better, but we should account for two for localization. 

If we're using "Suggested for [category] visitors" the max character for one line is 22.

Jterry/Ashonnard, can you come up with user facing category names? We should only 1 level names. i.e. not the full "Business/ Finance_Investing News" but "Investing News"
Flags: needinfo?(jterry)
Flags: needinfo?(ashonnard)
Where are you getting those numbers from? The font isn't fixed width, so "mmmmm" might not fit but "iiiii" could fit even though they have the same number of characters.
"Suggested for (x) visitors" is already 22 characters without adding in the category!!

What is the max length we should be working for? This will help us cut down the category names.
Flags: needinfo?(jterry)
Flags: needinfo?(edilee)
Flags: needinfo?(ashonnard)
I believe kghim was referring to the category name itself being 22 characters as he's already accounted for the other text on the line.

Although he retested with using the widest character (W) and could only fit ~10 and the thinnest character fit ~30.
Flags: needinfo?(edilee)
(In reply to ashonnard from comment #6)
> "Suggested for (x) visitors" is already 22 characters without adding in the
> category!!
> 
> What is the max length we should be working for? This will help us cut down
> the category names.

Ashonnard, the character would just be for the category name. I think we can expand this to 30 characters because of the thin letter (iiii) and wide letter (WWWW) combination. 

As mruttley indicated on email, we should have category name that we keep for hierarchical purposes and a display name which will have a 30 character limit.
All maxP values apart from Mozilla are under 0.7 now, the highest being 0.67. 
More data can be found here: https://docs.google.com/spreadsheets/d/1aM7FZwA3ounZVybS4xfhmTbKoHTRx_BiePVTLUFMnD4/edit#gid=0

I will attach the new cats.json file. If you'd like to work on the provisional names for the adgroups, I can create a new feature in Bucketerer?
Attachment #8615540 - Attachment is obsolete: true
(In reply to Kevin Ghim from comment #8)
> (In reply to ashonnard from comment #6)
> > "Suggested for (x) visitors" is already 22 characters without adding in the
> > category!!
> > 
> > What is the max length we should be working for? This will help us cut down
> > the category names.
> 
> Ashonnard, the character would just be for the category name. I think we can
> expand this to 30 characters because of the thin letter (iiii) and wide
> letter (WWWW) combination. 
> 
> As mruttley indicated on email, we should have category name that we keep
> for hierarchical purposes and a display name which will have a 30 character
> limit.

kevin, going out on a limb here...
Do we have the flexibility to end it with any type of collective noun? For example,  "Suggested for literature fans" or "Suggested for finance interests". Ending each bucket with visitors is a little hard for top lining buckets, such as "Suggested for literature visitors" or "Suggested for finance visitors" 

Hoping we can use other collective nouns to make it sound more broad and less robotic. 
Obviously this phrase was discussed for legal so let me know if i am far off with this request. 
-a
Flags: needinfo?(kghim)
I could create a feature which would let Bucketerer users edit:

 - The official category taxonomy name e.g. Arts & Entertainment / Literature
 - The short name: Literature
 - The phrase: Literature fans
(In reply to ashonnard from comment #11)
> (In reply to Kevin Ghim from comment #8)
> > (In reply to ashonnard from comment #6)
> > > "Suggested for (x) visitors" is already 22 characters without adding in the
> > > category!!
> > > 
> > > What is the max length we should be working for? This will help us cut down
> > > the category names.
> > 
> > Ashonnard, the character would just be for the category name. I think we can
> > expand this to 30 characters because of the thin letter (iiii) and wide
> > letter (WWWW) combination. 
> > 
> > As mruttley indicated on email, we should have category name that we keep
> > for hierarchical purposes and a display name which will have a 30 character
> > limit.
> 
> kevin, going out on a limb here...
> Do we have the flexibility to end it with any type of collective noun? For
> example,  "Suggested for literature fans" or "Suggested for finance
> interests". Ending each bucket with visitors is a little hard for top lining
> buckets, such as "Suggested for literature visitors" or "Suggested for
> finance visitors" 
> 
> Hoping we can use other collective nouns to make it sound more broad and
> less robotic. 
> Obviously this phrase was discussed for legal so let me know if i am far off
> with this request. 
> -a

Annelise, good suggestion, but until the end of June, we'll have to stick with "Suggested for [category] visitors" because that explanation text is hardcoded on the client. Once 40 goes into beta, we can send in the explanation text which will be "A/n [category] site suggestion" Does that have a better flow?

Mruttley, that would be great.
Flags: needinfo?(kghim)
We can already send the custom explanation from the server. It's just that only 40 users will see the explanation. So we should use the custom explanation now if we have it.
(In reply to Kevin Ghim from comment #13)
> (In reply to ashonnard from comment #11)
> > (In reply to Kevin Ghim from comment #8)
> > > (In reply to ashonnard from comment #6)
> > > > "Suggested for (x) visitors" is already 22 characters without adding in the
> > > > category!!
> > > > 
> > > > What is the max length we should be working for? This will help us cut down
> > > > the category names.
> > > 
> > > Ashonnard, the character would just be for the category name. I think we can
> > > expand this to 30 characters because of the thin letter (iiii) and wide
> > > letter (WWWW) combination. 
> > > 
> > > As mruttley indicated on email, we should have category name that we keep
> > > for hierarchical purposes and a display name which will have a 30 character
> > > limit.
> > 
> > kevin, going out on a limb here...
> > Do we have the flexibility to end it with any type of collective noun? For
> > example,  "Suggested for literature fans" or "Suggested for finance
> > interests". Ending each bucket with visitors is a little hard for top lining
> > buckets, such as "Suggested for literature visitors" or "Suggested for
> > finance visitors" 
> > 
> > Hoping we can use other collective nouns to make it sound more broad and
> > less robotic. 
> > Obviously this phrase was discussed for legal so let me know if i am far off
> > with this request. 
> > -a
> 
> Annelise, good suggestion, but until the end of June, we'll have to stick
> with "Suggested for [category] visitors" because that explanation text is
> hardcoded on the client. Once 40 goes into beta, we can send in the
> explanation text which will be "A/n [category] site suggestion" Does that
> have a better flow?
> 
> Mruttley, that would be great.

Kevin, would love to think about alternative flow. In the meantime, i will update the categories to fit within "Suggested for (category) visitors"
Iteration: 41.2 - Jun 8 → 41.3 - Jun 29
:ashonnard - I've created a new bug to integrate better descriptors here: https://bugzilla.mozilla.org/show_bug.cgi?id=1173569
Attached patch wip (obsolete) — Splinter Review
untested. unit tests probably need to be updated
Just to save this somewhere..

="  [ '" & JOIN(",", SORT(TRANSPOSE(SPLIT(INDIRECT("adgroups!B" & ROUND(ROW()/2)), "#")))) & "',"
="    '" & REGEXREPLACE(INDIRECT("adgroups!C" & ROUND(ROW()/2)), "(^An? | site suggestion$)", "") & "' ],"
Attached patch wip 2 (obsolete) — Splinter Review
We need a new test and fix code to allow for the names and descriptions with &

XML Parsing Error: not well-formed
Location: 
Line Number 1, Column 197:
<div xmlns="http://www.w3.org/1999/xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><span xmlns="http://www.w3.org/1999/xhtml"><div class='newtab-suggested-bounds'> Suggested for <strong> events & tickets </strong> visitors </div>

Tested with https://people.mozilla.org/~elee/suggested.tickets.json
It looks like we'll need to escape 5 characters &<>'" and perhaps at minimal these two: &<
Assignee: nobody → msamuel
An easier solution is changing "&" to "and", as we still fall below the character count.
Is that possible?
Flags: needinfo?(kghim)
Flags: needinfo?(edilee)
We could just use "and" then we could split off a separate bug to handle the & and < causing xml entity issues. Alternatively we could just hardcode & as &amp; for now.
Flags: needinfo?(edilee)
(In reply to Ed Lee :Mardak from comment #23)
> We could just use "and" then we could split off a separate bug to handle the
> & and < causing xml entity issues. Alternatively we could just hardcode & as
> &amp; for now.

Whichever you think is easier. "and" seems like a quick fix while "&" is being worked out. 

also, i found the copy paste error from yesterday. "charity" set off the alignment, as it wasn't in the previous list. Please confirm the following:
Entertainment_General -----> "An entertainment site suggestion" -----> "entertainment"
Environment_General  -----> "An environment site suggestion" -----> "environment"
Flags: needinfo?(edilee)
With these copy/paste issues, it's probably safest to get the data from the source with Bucketerer export. It's unclear if the list of sites changed from one sheet to another or if the list in Bucketerer is different now.
Flags: needinfo?(edilee)
(In reply to Ed Lee :Mardak from comment #25)
> With these copy/paste issues, it's probably safest to get the data from the
> source with Bucketerer export. It's unclear if the list of sites changed
> from one sheet to another or if the list in Bucketerer is different now.

Updated in Bucketer-er. 
http://172.29.2.113:5010/

Do you need a new export?
Flags: needinfo?(edilee)
Attachment #8623346 - Attachment is obsolete: true
Attachment #8623370 - Attachment is obsolete: true
Attachment #8623891 - Flags: review?(edilee)
Comment on attachment 8623891 [details] [diff] [review]
v1: Bucket updates & escaping html characters

>+++ b/browser/base/content/test/newtab/browser_newtab_enhanced.js
>+  suggestedLink.explanation = "Testing junk explanation &<>\"'";
>+  ok(suggested.indexOf("Testing junk explanation &amp;&lt;&gt;\"'") > -1, "Junk test");
Any idea why " and ' don't end up as &quot; and &#039;? I noticed it wasn't entirely necessary to escape the quotes anyway.

>+++ b/browser/modules/DirectoryLinksProvider.jsm
>+  _escapeChars: function(text) {
nit: _escapeChars(text) {

>+    let charMap = {
>+      '&': '&amp;',
>+      '<': '&lt;',
>+      '>': '&gt;',
>+      '"': '&quot;',
>+      "'": '&#039;'
>+    };
>+
>+    return text.replace(/[&<>"']/g, function(character) {
>+      return charMap[character];
nit: replace(.., char => charMap[char])

>+  [ 'askmen.com,boredomtherapy.com,buzzfeed.com,complex.com,dailymotion.com,elitedaily.com,gawker.com,howstuffworks.com,instagram.com,madamenoire.com,polygon.com,ranker.com,rollingstone.com,ted.com,theblaze.com,thechive.com,thecrux.com,thedailybeast.com,thoughtcatalog.com,uproxx.com,upworthy.com,zergnet.com',
>+    'environment' ],
As Annelise noted, there was some copy/paste error from some part of the chain. askmen is not quite about the environment. ;)
'entertainment'

Assuming bucketerer has the source of truth, we should get a direct export. Ideally in the [[sorted_sites_string, short_desc_string], ...] format
Flags: needinfo?(edilee)
Attachment #8623891 - Flags: review?(edilee) → review+
(In reply to Ed Lee :Mardak from comment #28)
> >+  [ 'askmen.com,boredomtherapy.com,buzzfeed.com,complex.com,dailymotion.com,elitedaily.com,gawker.com,howstuffworks.com,instagram.com,madamenoire.com,polygon.com,ranker.com,rollingstone.com,ted.com,theblaze.com,thechive.com,thecrux.com,thedailybeast.com,thoughtcatalog.com,uproxx.com,upworthy.com,zergnet.com',
> >+    'environment' ],
> 'entertainment'
To be clear, let's land with the data file we have now except change this askmen.com environment -> entertainment. If there are changes from bucketerer, we'll file a new bug and separately update the data file again.
(In reply to Ed Lee :Mardak from comment #28)
> Comment on attachment 8623891 [details] [diff] [review]
> v1: Bucket updates & escaping html characters
> 
> >+++ b/browser/base/content/test/newtab/browser_newtab_enhanced.js
> >+  suggestedLink.explanation = "Testing junk explanation &<>\"'";
> >+  ok(suggested.indexOf("Testing junk explanation &amp;&lt;&gt;\"'") > -1, "Junk test");
> Any idea why " and ' don't end up as &quot; and &#039;? I noticed it wasn't
> entirely necessary to escape the quotes anyway.

I was wondering that myself but I'm not entirely sure. It might have something to do with encodeURIComponent().
(In reply to Matthew Ruttley [:mruttley] from comment #31)
> Created attachment 8623925 [details]
> JSON dump of buckets, accurate as of 2015-06-17 6.30pm

Please use the most updated JSON (comment 31)
Comment on attachment 8623925 [details]
JSON dump of buckets, accurate as of 2015-06-17 6.30pm

Some issues with this file:

>            "A pet site suggestion "
trailing space

>            "toywiz.com "
>        "name": "Retail_Toys"
trailing space for toywiz

>            "An events and tickets site suggestion"
Annelise said this should be & not "and"

>            "progressive.com", 
>            "progressive.com ", 
>        "name": "Insurance_General"
Duplicate progressive with one having trailing space

>            "alibaba.com", 
>            "alibaba.com ", 
>        "name": "Retail_General Shopping"
Same issue trailing space on duplicate

>            "geeks.com ", 
>            "outletpc.com ", 
>            "outpost.com ", 
>        "name": "Technology_Retail"
Several with extra space
Attached file cleaned data
JSON.parse(require("fs").readFileSync("adgroups.json","utf8")).map(function(a){return[a.sites.map(function(v){return v.trim()}).filter(function(v,p,s){return s.indexOf(v)==p}).sort()+"",a.descriptors[0].replace(/^\s*An?\s*|\s*(site )?suggestion\s*$/g, "")]})
Comment on attachment 8623941 [details]
Latest data as of 2015-06-17 7.01pm, no dupes, no trailing spaces

This data file is indeed clean and the conversion script:

JSON.parse(require("fs").readFileSync("adgroups.json","utf8")).map(function(a){return[a.sites+"",a.descriptors[0].replace(/^An? | (site )?suggestion$/g,"")]})

Looks like bucketerer still has Retail_Tickets as             "An events and tickets site suggestion" instead of ".. events & tickets .."
(In reply to Ed Lee :Mardak from comment #36)
> Comment on attachment 8623941 [details]
> Latest data as of 2015-06-17 7.01pm, no dupes, no trailing spaces
> 
> This data file is indeed clean and the conversion script:
> 
> JSON.parse(require("fs").readFileSync("adgroups.json","utf8")).
> map(function(a){return[a.sites+"",a.descriptors[0].replace(/^An? | (site
> )?suggestion$/g,"")]})
> 
> Looks like bucketerer still has Retail_Tickets as             "An events and
> tickets site suggestion" instead of ".. events & tickets .."

it should be ".. events & tickets .."
I updated it in Bucket-er.
Please confirm you can update it to ".. events & tickets .." wherever applicable
https://hg.mozilla.org/mozilla-central/rev/ad64b12d3b01
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 41
Flags: needinfo?(kghim)
Attached image 2015-06-18 nightly
screenshot after visiting enough sites and ticketmaster.com with about:config browser.newtabpage.directory.source set to

https://people.mozilla.org/~elee/suggested.tickets.json
Same as previous attachment except source as

https://people.mozilla.org/~elee/suggested.tickets.explain.json

which has an extra

            "explanation": "An %1$S site suggestion",
Comment on attachment 8623891 [details] [diff] [review]
v1: Bucket updates & escaping html characters

Approval Request Comment

[Feature/regressing bug #]: Feature: add more suggested tiles targeting buckets

[User impact if declined]: Some suggestions may be targeted inaccurately or have missing potential targets 

[Describe test coverage new/current, TreeHerder]: Tested manually on Nightly and locally on Aurora. Mochitests adjusted/added where appropriate.

[Risks and why]: Low risk - just adding more suggestion buckets and a function to escape characters in their labels.

[String/UUID change made/needed]: N/A
Attachment #8623891 - Flags: approval-mozilla-aurora?
Could you split the change into two ? the function and the updated list of suggestion
or is that too much work?
Sure. Does that help with approval?
See comment 42
Attachment #8624744 - Flags: approval-mozilla-aurora?
See comment 42
Attachment #8624747 - Flags: approval-mozilla-aurora?
Attachment #8623891 - Flags: approval-mozilla-aurora?
Attachment #8624747 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8624744 [details] [diff] [review]
[aurora] Part 1: Bucket updates

I forgot to update it.
Attachment #8624744 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.