Closed Bug 1232746 Opened 4 years ago Closed 4 years ago

Document process for generating inline autocomplete domain list

Categories

(Firefox for Android :: General, defect)

35 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 46
Tracking Status
firefox46 --- fixed

People

(Reporter: Margaret, Assigned: mfinkle)

References

Details

Attachments

(1 file, 1 obsolete file)

Let's land some documentation in the tree to describe the process we use to generate the list of inline autocomplete suggestions.

I can write a patch for this, but Barbara, can you help me come up with the text we'll use for this?
Flags: needinfo?(bbermes)
Attached patch defaultdomains-doc v0.1 (obsolete) — Splinter Review
Margaret, Barbara - Take a look at the text
Nick - Did I get the RST stuff right?
Assignee: margaret.leibovic → mark.finkle
Attachment #8698549 - Flags: review?(nalexander)
Attachment #8698549 - Flags: review?(margaret.leibovic)
Attachment #8698549 - Flags: review?(bbermes)
Duplicate of this bug: 1232757
Comment on attachment 8698549 [details] [diff] [review]
defaultdomains-doc v0.1

Review of attachment 8698549 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty good!  Sorry to bitrot you, but this should go in mobile/android/docs now.  (See https://bugzilla.mozilla.org/show_bug.cgi?id=1230786.)  There's an upside, though!  You can run |mach doc mobile/android| and get just the Fennec documentation built.  That'll make it easier to sort out warnings and any formatting issues.

I read the content too; I had a few nits, but others can iterate.  No significant comments.

::: mobile/android/base/docs/defaultdomains.rst
@@ +1,3 @@
> +.. -*- Mode: rst; fill-column: 100; -*-
> +
> +======================================

My guess is you'll want fewer = signs, to match the title length.
Attachment #8698549 - Flags: review?(nalexander) → review+
NI legal for their review on this as well
Flags: needinfo?(merwin)
Flags: needinfo?(elee)
Flags: needinfo?(bbermes)
Flags: needinfo?(elee) → needinfo?(ellee)
I updated the patch to handle the source folder changes, and fixed some typos and nits in the text.
Attachment #8698549 - Attachment is obsolete: true
Attachment #8698549 - Flags: review?(margaret.leibovic)
Attachment #8698549 - Flags: review?(bbermes)
Attachment #8698823 - Flags: review?(margaret.leibovic)
Attachment #8698823 - Flags: review?(bbermes)
Comment on attachment 8698823 [details] [diff] [review]
defaultdomains-doc v0.2

Review of attachment 8698823 [details] [diff] [review]:
-----------------------------------------------------------------

This matches what we currently do, so r+.

I'll leave it to the legal team to advise on whether we need to change any of this content.

::: mobile/android/docs/defaultdomains.rst
@@ +20,5 @@
> +history and you are forced to type full URLs. Shipping a set of top domains provides a fallback.
> +
> +The top domains list can be localized, but Firefox will fallback to using en-US as the default for all
> +locales that do not provide a specific set. The list can have several hundred domains, but due to
> +size concerns, is usually capped to five hundred or less.

I think we could even get the majority of the benefit from shipping even 100 sites. If it makes the auditing easier, I would be in favor of an even shorter list.

@@ +25,5 @@
> +
> +Sanitizing Methods
> +------------------
> +
> +After getting a source list, e.g. Alexa top global sites, we apply some simple guidelines to the

We'll update this following the legal review, but I think we should be very explicit here about exactly where this initial list of sites is coming from.

@@ +31,5 @@
> +
> +* Remove any locale-specific domain duplicates. We assume primary URLs (.com) will redirect to the
> +  correct locale (.co.jp) at run-time.
> +* Remove any explicit adult content domains.
> +* Remove any sites that uses explicit or adult advertising.

Nit: use

@@ +37,5 @@
> +* Remove any content/CDN domains. Some sites use separate domains to store images and other
> +  static content.
> +* Remove any sites primarily used for advertising or management of advertising.
> +* Remove any sites that fail to load in mobile browsers.
> +* Remove any time/date specific sites that may have appeared on the list due to seasonal spikes.

How do we know this? Is there a way to get a top site list that's based on average usage over the past year? Maybe that would mitigate this problem (and allow us to update less frequently).

@@ +46,5 @@
> +Suggested sites are default thumbnails, displayed on the Top Sites home panel. A suggested site
> +consists of a title, thumbnail image, background color and URL. Multiple images are usually
> +required to handle the variety of device DPIs.
> +
> +Suggested sites can be localized, but Firefox will fallback to using en-US as the default for all

These are actually not currently localized. We should loop in flod here.
Attachment #8698823 - Flags: review?(margaret.leibovic) → review+
> 
> @@ +31,5 @@
> > +
> > +* Remove any locale-specific domain duplicates. We assume primary URLs (.com) will redirect to the
> > +  correct locale (.co.jp) at run-time.
> > +* Remove any explicit adult content domains.
> > +* Remove any sites that uses explicit or adult advertising.
> 
> Nit: use
> 

How do we identify adult content domains and adult advertising domains?  I saw Mark's suggestion that we just take top Alexa - the top Alexa adult category.  Are we taking that approach? Same question for the advertising & management of advertising sites. Do we just do this by manual inspection?
Flags: needinfo?(merwin)
Flags: needinfo?(ellee)
Flags: needinfo?(mark.finkle)
(In reply to Merwin from comment #7)
> > 
> > @@ +31,5 @@
> > > +
> > > +* Remove any locale-specific domain duplicates. We assume primary URLs (.com) will redirect to the
> > > +  correct locale (.co.jp) at run-time.
> > > +* Remove any explicit adult content domains.
> > > +* Remove any sites that uses explicit or adult advertising.
> > 
> > Nit: use
> > 
> 
> How do we identify adult content domains and adult advertising domains?  I
> saw Mark's suggestion that we just take top Alexa - the top Alexa adult
> category.  Are we taking that approach? Same question for the advertising &
> management of advertising sites. Do we just do this by manual inspection?

For adult content, we created a blocklist from the Alexa adult category. This is not complete though, so we end up doing manual inspection. We found more adult content sites through manual inspection than via the Alexa adult content list.

Advertising and marketing management sites are also discovered via manual inspection.

Sites that do not load in mobile browsers are also found via manual inspection.

All of these sites were added to the attached blocklist.
Flags: needinfo?(mark.finkle)
(In reply to :Margaret Leibovic from comment #6)

> > +* Remove any time/date specific sites that may have appeared on the list due to seasonal spikes.
> 
> How do we know this? Is there a way to get a top site list that's based on
> average usage over the past year? Maybe that would mitigate this problem
> (and allow us to update less frequently).

Manual inspection. We found several "black friday" sites in the top 500.
Comment on attachment 8698823 [details] [diff] [review]
defaultdomains-doc v0.2

Landing this as the "what we do now" baseline. We can tweak with legal and product comments in a new bug.
Attachment #8698823 - Flags: review?(bbermes)
https://hg.mozilla.org/mozilla-central/rev/51a5ff26043b
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 46
Blocks: 1216987
You need to log in before you can comment on or make changes to this bug.