Autodetect repeated use of a search field, and ask the user if they want that engine added to the browser search box

NEW
Unassigned

Status

()

Firefox
Search
P3
enhancement
Rank:
33
11 years ago
8 months ago

People

(Reporter: Joe Hughes, Unassigned)

Tracking

(Blocks: 2 bugs)

2.0 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [gsoc][fxsearch])

Attachments

(1 attachment, 9 obsolete attachments)

(Reporter)

Description

11 years ago
This is Ben Goodger's idea.  If we notice that the user has used a particular search box more than X times, we should give them the option to add it as an option in the browser search box.  (If they decide they want to, we can auto-generate a search plugin from the form.)
Please also an option to switch that off for people who don't have the searchbox on their toolbar and use keywords instead.
(Reporter)

Updated

11 years ago
No longer blocks: 335435
(Reporter)

Comment 2

11 years ago
Ben says "See http://wiki.mozilla.org/Search_Service:Code_Design#Automatic_Detection for
the auto-detection heuristic."
Depends on: 879155
Assignee: nobody → sankha93
Whiteboard: [gsoc][mentor=MattN]
Status: NEW → ASSIGNED
Blocks: 882035
Created attachment 761322 [details] [diff] [review]
patch with just algorithm

I have just implemented the search detection algorithm in this patch. Lots of places have been marked TODO, these are mostly related to implementing the JSON storage. I will be working on that now.

In the meantime can you just give feedback on this?
Attachment #761322 - Flags: feedback?(mnoorenberghe+bmo)
Comment on attachment 761322 [details] [diff] [review]
patch with just algorithm

Review of attachment 761322 [details] [diff] [review]:
-----------------------------------------------------------------

This is a good start. I think you should try get the module hooked up to the browser and get the detection to run and log some output before going much deeper. A working form listener that detects search forms is probably enough for a Part 1 patch.  Subsequent patches can build on top of that probably.

An idea for later. We should check if there are event handlers (e.g. onsubmit) for the form and potentially skip it. Maybe a lack of useful @action is enough to detect these?

::: toolkit/modules/AutosuggestSearchEngine.jsm
@@ +10,5 @@
> +const Cu = Components.utils;
> +
> +this.AutosuggestSearchEngine = {
> +    Cu.import("resource://gre/modules/Services.jsm");
> +    Cu.import("resource://gre/modules/PrivateBrowsingUtils.jsm");

FYI: This is not valid JS as-is. The imports should move above the object. The others can move there or somewhere else depending on how they're used.

@@ +65,5 @@
> +        log: function (message) {
> +            if (!this.debug)
> +                return;
> +            dump("autosuggestFormListener: " + message + "\n");
> +            Services.console.logStringMessage("autosuggestFormListener: " + message);

Nit: You can import the new console.jsm and it will take care of both of these output methods.

@@ +74,5 @@
> +        handleEvent: function (e) {
> +            switch (e.type) {
> +                case "unload":
> +                    Services.obs.removeObserver(this, "earlyformsubmit");
> +                    Services.prefs.addObserver("browser.autosuggestSearch.", this, false);

This addObserver doesn't seem right

@@ +93,5 @@
> +        },
> +
> +        /* nsIFormSubmitObserver Interface */
> +
> +        notify: function (form, domWin, actionURI, cancelSubmit) {

We should ensure there is an actionURI because we would need it to create an engine. We should probably also check that it's http/https

@@ +110,5 @@
> +                this.log("Form submit observer notified.");
> +
> +                let entries = [];
> +
> +                //TODO: load formScore and mostLikelyField from database

This would also be where we see if the domain is relevant to the user based on Places data.

@@ +128,5 @@
> +                    minSearches = number of min submission threshold
> +                    mostLikelyField = the fields suspected to be possible
> +                    TODO: implement these from the JSON storage
> +                    */
> +                    if (input.type == "text" && fLength > minSearches && input.name != mostLikelyField.name ||

We have a helper to know if a field is text-like which may be useful. input.mozIsTextField(true): http://mxr.mozilla.org/mozilla-central/source/dom/interfaces/html/nsIDOMHTMLInputElement.idl#103

It uses nsIFormControl::IsSingleLineTextControl from http://mxr.mozilla.org/mozilla-central/source/content/html/content/public/nsIFormControl.h#234

@@ +133,5 @@
> +                        fLength > mostLikelyField.fLength) {
> +                        mostLikelyField.name = input.name;
> +                        mostLikelyField.fLength++;
> +                        numberOfTextFields++;
> +                        if (input.value.indexOf("@") < 0)

You can use .contains [1] although I'm not sure we should worry about this special-case for version 1. 

[1] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/contains

@@ +140,5 @@
> +                            formScore += 1;
> +                    }
> +
> +                    if (input.type == "search" && fLength > minSearches && input.name != mostLikelyField.name ||
> +                        fLength > mostLikelyField.fLength) {

It seems like these differences could be included inside the previous if block based on the type so there is less duplication

@@ +147,5 @@
> +                        numberOfTextFields++;
> +                        formScore += 2;
> +                    }
> +
> +                    if (input.name == "search" || input.name == "q" || input.name == "query") {

I'm not sure if we want this initially.
Attachment #761322 - Flags: feedback?(mnoorenberghe+bmo) → feedback+
Created attachment 764845 [details] [diff] [review]
patch with storage component

I have implemented the JSOn file storage and the helper functions, also fixed the comments above. Can you go through this?
Attachment #761322 - Attachment is obsolete: true
Attachment #764845 - Flags: feedback?(mnoorenberghe+bmo)
Created attachment 774019 [details] [diff] [review]
module with initial implementation

I have removed most of the TODO comments. What should be the next target in the implementation?

I used DeferredTask, but that can delay the task by a certain time, but can we have something like setInterval() as well, so that it is repeated after a specific interval always?

I will be working on the tests.
Attachment #764845 - Attachment is obsolete: true
Attachment #764845 - Flags: feedback?(mnoorenberghe+bmo)
Attachment #774019 - Flags: feedback?(mnoorenberghe+bmo)
Created attachment 775070 [details] [diff] [review]
hacky patch
Attachment #774019 - Attachment is obsolete: true
Attachment #774019 - Flags: feedback?(mnoorenberghe+bmo)
Attachment #775070 - Flags: feedback?(mnoorenberghe+bmo)
Created attachment 780009 [details] [diff] [review]
WIP Patch v1

This patch computes the form scores exactly as proposed in the project proposal.

This does I/O after receiving every form submission event though.
Attachment #775070 - Attachment is obsolete: true
Attachment #775070 - Flags: feedback?(mnoorenberghe+bmo)
Attachment #780009 - Flags: feedback?(mnoorenberghe+bmo)
Created attachment 782241 [details] [diff] [review]
patch v2 + tests

I have modified the algorithm a bit, and also written the tests to find if the search form scoring algorithm is working correctly.
Attachment #780009 - Attachment is obsolete: true
Attachment #780009 - Flags: feedback?(mnoorenberghe+bmo)
Attachment #782241 - Flags: feedback?(mnoorenberghe+bmo)
Comment on attachment 782241 [details] [diff] [review]
patch v2 + tests

Review of attachment 782241 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent! Great to see tests! :)

I haven't looked closely at the JSM again but wanted to submit what I had already.

::: browser/base/content/browser.js
@@ +91,5 @@
>  XPCOMUtils.defineLazyModuleGetter(this, "Weave",
>    "resource://services-sync/main.js");
>  #endif
>  
> +XPCOMUtils.defineLazyGetter(this, "AutosuggestSearchEngine", function() {

It seems like you can just use defineLazyModuleGetter here unless I'm missing something.

@@ +1181,5 @@
>          setUrlAndSearchBarWidthForConditionalForwardButton();
>      });
>  
> +    //enable Autosuggest Search Engine
> +    AutosuggestSearchEngine.init();

Nit: can you put this beside gFormSubmitObserver.init(); and remove the comment

::: toolkit/modules/AutosuggestSearchEngine.jsm
@@ +5,5 @@
> +this.EXPORTED_SYMBOLS = ["AutosuggestSearchEngine"];
> +
> +const Cc = Components.classes;
> +const Ci = Components.interfaces;
> +const Cr = Components.results;

Nit: Cc and Cr are unused and can be removed. You can also use the following more compact form:

const { interfaces: Ci, utils: Cu } = Components;

@@ +10,5 @@
> +const Cu = Components.utils;
> +
> +Cu.import("resource://gre/modules/Services.jsm");
> +Cu.import("resource://gre/modules/PrivateBrowsingUtils.jsm");
> +Cu.import("resource://gre/modules/devtools/Console.jsm");

Before review, can you put the console.log calls behind the debug pref in a helper? You had the helper in a previous version. You can also put this import in a lazy module getter.

::: toolkit/modules/tests/plain/Makefile.in
@@ +2,5 @@
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +DEPTH   = @DEPTH@
> +topsrcdir = @top_srcdir@

Nit: I think the pattern is to align the "=" but I could be wrong. Note: The build files will need build peer review eventually.

::: toolkit/modules/tests/plain/test_form_submission.html
@@ +7,5 @@
> +</head>
> +<body>
> +<p id="display"></p>
> +<div id="content" style="display: none">
> +  

Nit: trailing space

@@ +8,5 @@
> +<body>
> +<p id="display"></p>
> +<div id="content" style="display: none">
> +  
> +  <!-- ===== Things that should not be saved. ===== -->

I think this comment is a leftover from copying this test.

@@ +84,5 @@
> +}
> +
> +// Called by each form's onsubmit handler.
> +function checkSubmit(formNum) {
> +  netscape.security.PrivilegeManager.enablePrivilege('UniversalXPConnect');

I think we should avoid enablePrivilege in new tests as it's going away and I'm not sure it's necessary since you're using SpecialPowers.

@@ +89,5 @@
> +
> +  ok(true, "form " + formNum + " submitted");
> +  numSubmittedForms++;
> +
> +  if (formNum < 1 && formNum > 6) {

I think you meant ||

@@ +97,5 @@
> +  return submitForm(formNum);
> +}
> +
> +function submitForm(formNum)
> +{

Nit: cuddle the brace

@@ +115,5 @@
> +    case 5:
> +      is(AutosuggestSearchEngine.storageHandler.data[url + '#5'].formScore, 3.2, "GET form submission score is correct");
> +      break;
> +    case 6:
> +      is(AutosuggestSearchEngine.storageHandler.data[url + '#6'].formScore, 3.2, "GET form submission score is correct");

If I'm following this properly, this should say POST instead of GET and I think it should have a score of 0 like the following test because of the multiple text fields.

@@ +131,5 @@
> +  //
> +  // This in itself is fine, but if there are errors in the code, mochitests
> +  // will in some cases give you "server too busy", which is hard to debug!
> +  //
> +  setTimeout(function() {

SimpleTest.executeSoon would be preferred instead of setTimeout(…, 0) here but I think we should look into
Attachment #782241 - Flags: feedback?(mnoorenberghe+bmo)
Attachment #782241 - Flags: feedback+
Created attachment 798323 [details] [diff] [review]
AutodetectSearch.patch

Sorry, took some time in understanding the Places API.

I have addressed the previous comments. This takes the highest frecency of the domain name of the action URI and uses it to decide, whether it is a frequently enough visited site. Then it sends a notification.

Tests are not yet done for this part. It'll be helpful if you can provide some feedback on this.
Attachment #782241 - Attachment is obsolete: true
Attachment #782241 - Flags: feedback?(mnoorenberghe+bmo)
Attachment #798323 - Flags: feedback?(mnoorenberghe+bmo)
Comment on attachment 798323 [details] [diff] [review]
AutodetectSearch.patch

Review of attachment 798323 [details] [diff] [review]:
-----------------------------------------------------------------

Overall it looks good. Lets move the Places stuff to a separate patch and finish cleaning up the rest first and get it reviewed and landed.

I'll do faster review passes this week as I go on vacation starting Monday although I can still review your patches if we don't have things done by then.

::: modules/libpref/src/init/all.js
@@ +3921,5 @@
>  pref("browser.formfill.timeGroupingSize", 604800);
>  pref("browser.formfill.boundaryWeight",   25);
>  pref("browser.formfill.prefixWeight",     5);
>  
> +// Autosuggest Serach Engine prefs

typo: Serach

@@ +3923,5 @@
>  pref("browser.formfill.prefixWeight",     5);
>  
> +// Autosuggest Serach Engine prefs
> +pref("browser.autosuggestSearch.debug",   false);
> +pref("browser.autosuggestSearch.enable",  true);

This should be "enabled" (with the d) for consistency with other similar prefs and to match your code.

::: toolkit/modules/AutosuggestSearchEngine.jsm
@@ +3,5 @@
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +
> +this.EXPORTED_SYMBOLS = ["AutosuggestSearchEngine"];
> +
> +const { classes: CC, interfaces: Ci, utils: Cu } = Components;

Nit: "Cc"

@@ +11,5 @@
> +Cu.import("resource://gre/modules/osfile.jsm");
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");
> +Cu.import("resource://gre/modules/PlacesUtils.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "console",
> +                                  "resource://gre/modules/devtools/Console.jsm");

I believe the console object is already available in all JSMs so this is not necessary.

@@ +12,5 @@
> +Cu.import("resource://gre/modules/XPCOMUtils.jsm");
> +Cu.import("resource://gre/modules/PlacesUtils.jsm");
> +XPCOMUtils.defineLazyModuleGetter(this, "console",
> +                                  "resource://gre/modules/devtools/Console.jsm");
> +/**

Could you add a comment here at the top of the file giving a high-level overview of what the module does.

@@ +32,5 @@
> +    },
> +
> +    /**
> +     * This implements the Storage handling in a JSON file.
> +     * The data is stored in {profile-Dir}/autosuggest.json

more descriptive filename

@@ +41,5 @@
> +     *                                  mostLikelyField: 'query',
> +     *                                  formScore: 0 }
> +     * }
> +     */
> +    storageHandler: {

Why are the methods implemented at the bottom of the JSM separate from the properties?

@@ +59,5 @@
> +    /**
> +     * This listens to form submissions and triggers the algorithm to compute
> +     * the most likely search field.
> +     */
> +    autosuggestFormListener: {

This will need some more work in a follow-up to work with E10S IIUC.

@@ +74,5 @@
> +         */
> +        init: function() {
> +            Services.obs.addObserver(this, "earlyformsubmit", false);
> +            Services.prefs.addObserver("browser.autosuggestSearch.", this, false);
> +            /* this.updatePrefs(); */ // will be enabled after this is completed

It seems like this can be uncommented once you fix all.js, right?

@@ +89,5 @@
> +        handleEvent: function (e) {
> +            switch (e.type) {
> +                case "unload":
> +                    Services.obs.removeObserver(this, "earlyformsubmit");
> +                    Services.prefs.addObserver("browser.autosuggestSearch.", this, false);

I think this should be removeObserver with only the first 2 args.

@@ +114,5 @@
> +                if (!this.enabled)
> +                    return;
> +
> +                let actionURI = action_nsIURI.spec;
> +                let actionHost = action_nsIURI.host;

"action_nsIURI" doesn't follow the naming pattern of other variables in this file. "actionURI" should be sufficient. Generally, in Fx code, a variable with "URI" in the name is an nsIURI whereas a "URL" indicates a URL as a string.
You should null-check action_nsIURI as this seems to fail on a data URI.

@@ +117,5 @@
> +                let actionURI = action_nsIURI.spec;
> +                let actionHost = action_nsIURI.host;
> +
> +                if (actionURI == "" || actionURI == undefined)
> +                    return;

I think !actionURI is probably sufficient, no?

@@ +123,5 @@
> +                if (PrivateBrowsingUtils.isWindowPrivate(domWin))
> +                    return;
> +
> +                if (this.debug)
> +                    console.log("Form submit observer notified.");

As a reminder: "Before review, can you put the console.log calls behind the debug pref in a helper? You had the helper in a previous version."

@@ +128,5 @@
> +
> +                let password = false;
> +
> +                if (!AutosuggestSearchEngine.storageHandler.isAvailable(actionURI))
> +                    AutosuggestSearchEngine.storageHandler.createNew(actionURI);

This feels a bit awkward to me as I don't see why we need…

@@ +132,5 @@
> +                    AutosuggestSearchEngine.storageHandler.createNew(actionURI);
> +
> +                let fLength = AutosuggestSearchEngine.storageHandler.getNumberOfSubmissions(actionURI),
> +                    formScore = AutosuggestSearchEngine.storageHandler.getFormScore(actionURI),
> +                    mostLikelyField = AutosuggestSearchEngine.storageHandler.getMostLikelyField(actionURI),

Is the point of the get* functions to hide the structure of the objects for an actionURI? Otherwise, can't we have the above 6 lines be one function AutosuggestSearchEngine.storageHandler.getForm(actionURI); which returns the object or creates a new one if an entry doesn't exist?

If you just want to keep track of when storage is dirty and don't trust consumers to set the dirty flag, you can use setters to achieve that.

@@ +135,5 @@
> +                    formScore = AutosuggestSearchEngine.storageHandler.getFormScore(actionURI),
> +                    mostLikelyField = AutosuggestSearchEngine.storageHandler.getMostLikelyField(actionURI),
> +                    numberOfTextFields = 0;
> +
> +                for (let i = 0; i < form.elements.length; i++) {

Move the form score calculation code to its own function. Leave all the early return checks above in this function.

@@ +143,5 @@
> +
> +                    if (input.type == "password") {
> +                        formScore = 0;
> +                        password = true;
> +                        break;

When the score calculation is in its own function, I think you may want to bail (return) from that function here as I don't believe we ever care about a form with a password, right?

@@ +174,5 @@
> +                fLength++;
> +
> +                AutosuggestSearchEngine.storageHandler.setNumberOfSubmissions(actionURI, fLength);
> +                AutosuggestSearchEngine.storageHandler.setFormScore(actionURI, formScore);
> +                AutosuggestSearchEngine.storageHandler.setMostLikelyField(actionURI, mostLikelyField);

I think these three functions would be cleaner as one.

@@ +177,5 @@
> +                AutosuggestSearchEngine.storageHandler.setFormScore(actionURI, formScore);
> +                AutosuggestSearchEngine.storageHandler.setMostLikelyField(actionURI, mostLikelyField);
> +                if (this.debug)
> +                    console.log("Form Score", formScore);
> +                AutosuggestSearchEngine.storageHandler.saveData();

I think we should avoid the save on every search submission use DeferredTask to only save at most once every N minutes and before shutdown. This way, a user who refines their search multiple times in a row in N minutes will only have additional disk I/O of 1 write and stringification from this module.

@@ +214,5 @@
> +/**
> + * Initializes the in-memory object to read/write data
> + */
> +AutosuggestSearchEngine.storageHandler.init = function() {
> +    let decoder = new TextDecoder();

It seems like this can move inside onSuccess

@@ +227,5 @@
> +
> +/**
> + * Helper method to find if an URI is present in the database
> + */
> +AutosuggestSearchEngine.storageHandler.isAvailable = function(actionURI) {

I suspect this function won't be needed anymore.

@@ +235,5 @@
> +/**
> + * creates a new object in case the URI is not present
> + */
> +AutosuggestSearchEngine.storageHandler.createNew = function(actionURI) {
> +    AutosuggestSearchEngine.storageHandler.data[actionURI] = { fLength: 0,

For all of these storangeHandler functions which access "data[actionURI]", we need to be careful that actionURI isn't something other than a URI (possibly malicious) that could have undesired consequences. (e.g. actionURI = "toString"). This may already be covered but I wanted to point it out as something to keep in mind.

@@ +237,5 @@
> + */
> +AutosuggestSearchEngine.storageHandler.createNew = function(actionURI) {
> +    AutosuggestSearchEngine.storageHandler.data[actionURI] = { fLength: 0,
> +                             mostLikelyField: { name: "",
> +                             fLength: 0 },

fLength is not descriptive enough, please replace this variable name everywhere. Something like "submissionCount" would be better.

@@ +238,5 @@
> +AutosuggestSearchEngine.storageHandler.createNew = function(actionURI) {
> +    AutosuggestSearchEngine.storageHandler.data[actionURI] = { fLength: 0,
> +                             mostLikelyField: { name: "",
> +                             fLength: 0 },
> +                             formScore: 0 };

Fix indentation here
Attachment #798323 - Flags: feedback?(mnoorenberghe+bmo) → feedback+
Created attachment 803894 [details] [diff] [review]
patch v4

I have addressed the previous comments. Also updated the tests, so they now check the most likely field in the form being tested.

Removed the unnecessary methods from storageHandler. The DeferredTask performs a write to the disk every 5 mins and once before the browser quits.
Attachment #798323 - Attachment is obsolete: true
Attachment #803894 - Flags: review?(mnoorenberghe+bmo)
Duplicate of this bug: 879155
No longer depends on: 879155
Blocks: 879155
Created attachment 803929 [details] [diff] [review]
patch v4.1

Same as the old one, but now has tests for checking if writing to disk is working properly.
Attachment #803894 - Attachment is obsolete: true
Attachment #803894 - Flags: review?(mnoorenberghe+bmo)
Attachment #803929 - Flags: review?(mnoorenberghe+bmo)
Comment on attachment 803929 [details] [diff] [review]
patch v4.1

I didn't have time to get to this today.

Flagging gps for review of the build changes (a new mochitest-plain sub-directory)
Attachment #803929 - Flags: review?(gps)
Comment on attachment 803929 [details] [diff] [review]
patch v4.1

Review of attachment 803929 [details] [diff] [review]:
-----------------------------------------------------------------

I trust you'll fix the nits before checkin.

::: toolkit/modules/tests/moz.build
@@ +3,5 @@
>  # This Source Code Form is subject to the terms of the Mozilla Public
>  # License, v. 2.0. If a copy of the MPL was not distributed with this
>  # file, You can obtain one at http://mozilla.org/MPL/2.0/.
>  
> +DIRS += ['browser', 'plain']

Let's call it 'mochitest-plain.' 'plain' isn't a directory name currently and is slightly ambiguous IMO.

::: toolkit/modules/tests/plain/Makefile.in
@@ +12,5 @@
> +MOCHITEST_FILES = \
> +		test_form_submission.html \
> +		$(NULL)
> +
> +include $(topsrcdir)/config/rules.mk

You only need the license header and the MOCHITEST_FILES assignment now. The other boilerplate is automatic.
Attachment #803929 - Flags: review?(gps) → review+
Created attachment 804820 [details] [diff] [review]
patch v4.2

Addresses gps's comments.
Attachment #803929 - Attachment is obsolete: true
Attachment #803929 - Flags: review?(mnoorenberghe+bmo)
Attachment #804820 - Flags: review?(mnoorenberghe+bmo)
Comment on attachment 804820 [details] [diff] [review]
patch v4.2

I still need to look into the perf impact of the storage and I'll have time for that after Australis (Fx29) ships.
Attachment #804820 - Flags: review?(MattN+bmo)
Flags: needinfo?(MattN+bmo)
(Assignee)

Updated

3 years ago
Mentor: MattN+bmo@mozilla.com
Whiteboard: [gsoc][mentor=MattN] → [gsoc]
Phillip, is this something we're still interested in? There is a patch to implement it but it just needs to deal with performance issues in comment 19.
Flags: needinfo?(MattN+bmo) → needinfo?(philipp)
Generally: yes :)
I don't think I've ever seen the UI for this though. Do you happen to have a screen shot of what this looks like?
Flags: needinfo?(philipp)
This bug was to implement the backend that lets the UI know via a autosuggestSearch.positive message and then bug 879155 was going to do the UI. I think the plan was to use the existing OpenSearch UI that shows the green + icon in the search bar.

Updated

2 years ago
Priority: -- → P3
Whiteboard: [gsoc] → [gsoc][fxsearch]

Updated

2 years ago
Rank: 33
Assignee: sankha93 → nobody
Mentor: MattN+bmo@mozilla.com
Status: ASSIGNED → NEW
You need to log in before you can comment on or make changes to this bug.