Closed Bug 1040721 Opened 11 years ago Closed 11 years ago

Add ability to parse search result URLs

Tracking

()

Status:

RESOLVED FIXED

Milestone:

Firefox 34

Iteration:

34.1

People

(Reporter: Paolo, Assigned: Paolo)

References

Details

Attachments

(1 file, 5 obsolete files)

Preliminary patch 11 years ago :Paolo Amadini 12.94 KB, patch		Details \| Diff \| Splinter Review
The patch 11 years ago :Paolo Amadini 32.28 KB, patch		Details \| Diff \| Splinter Review
Updated patch 11 years ago :Paolo Amadini 35.49 KB, patch		Details \| Diff \| Splinter Review
New approach 11 years ago :Paolo Amadini 32.26 KB, patch	Gavin : feedback+	Details \| Diff \| Splinter Review
Using a map 11 years ago :Paolo Amadini 34.75 KB, patch	Gavin : review+	Details \| Diff \| Splinter Review
Final patch 11 years ago :Paolo Amadini 35.35 KB, patch		Details \| Diff \| Splinter Review

:Paolo Amadini

Assignee

Description

•

11 years ago

The details required for mapping an URL to a search engine and terms, such as the parameter name, search path and alternate domains listed in bug 1038225, should be exposed by nsSearchService.js so that the Places component can use them to identify the URL.

:Paolo Amadini

Assignee

Updated

•

11 years ago

Flags: firefox-backlog+

:Paolo Amadini

Assignee

Comment 1

•

11 years ago

Attached patch Preliminary patch (obsolete) — Details — Splinter Review

This does not have tests yet, but is a possible implementation of the design. The reason why the prefix and alternate domains are separated is to make it super-easy to import the data from the Google list. No processing is needed. See the patch I'll post in bug 959582 for how the consumer will use the API.

Assignee: nobody → paolo.mozmail

Status: NEW → ASSIGNED

Attachment #8458829 - Flags: feedback?(gavin.sharp)

Marco Mucci [:MarcoM]

Comment 2

•

11 years ago

Hi Paolo, can you provide a point value and if the bug should be marked as [qa+] or [qa-] for verification.

Iteration: --- → 33.3

QA Whiteboard: [qa?]

Flags: needinfo?(paolo.mozmail)

:Paolo Amadini

Assignee

Updated

•

11 years ago

Points: --- → 5

QA Whiteboard: [qa?] → [qa-]

Flags: needinfo?(paolo.mozmail)

:Paolo Amadini

Assignee

Updated

•

11 years ago

Depends on: 1041534

:Paolo Amadini

Assignee

Comment 3

•

11 years ago

Attached patch The patch (obsolete) — Details — Splinter Review

This patch now includes tests for the functionality.

Attachment #8458829 - Attachment is obsolete: true

Attachment #8458829 - Flags: feedback?(gavin.sharp)

Attachment #8459595 - Flags: review?(gavin.sharp)

:Paolo Amadini

Assignee

Comment 4

•

11 years ago

Attached patch Updated patch (obsolete) — Details — Splinter Review

This contains one additional browser-chrome test for the alternate domains addition to the Google engine.

Attachment #8459595 - Attachment is obsolete: true

Attachment #8459595 - Flags: review?(gavin.sharp)

Attachment #8459613 - Flags: review?(gavin.sharp)

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 5

•

11 years ago

Comment on attachment 8459613 [details] [diff] [review] Updated patch Discussed this Paolo, he will rework this patch to use a separate module for storing the engine name->engine URL maps. We also discussed an alternate interface where more of this logic lives in the search service - I will review that alternate interface proposal once he drafts it.

Attachment #8459613 - Flags: review?(gavin.sharp)

Marco Mucci [:MarcoM]

Updated

•

11 years ago

Iteration: 33.3 → 34.1

Marco Mucci [:MarcoM]

Updated

•

11 years ago

Blocks: 1034381

:Paolo Amadini

Assignee

Comment 6

•

11 years ago

Attached patch New approach (obsolete) — Details — Splinter Review

In the end, I think that this alternate approach we discussed is better and the interface looks simpler, it should also be almost as fast as the other one.

Attachment #8459613 - Attachment is obsolete: true

Attachment #8460215 - Flags: review?(gavin.sharp)

:Paolo Amadini

Assignee

Updated

•

11 years ago

No longer blocks: 959582

Summary: The search service should provide details for parsing search result URLs → Add ability to parse search result URLs

Marco Bonardo [:mak]

Updated

•

11 years ago

Blocks: 1034382

:Paolo Amadini

Assignee

Updated

•

11 years ago

Blocks: 1042604

:Paolo Amadini

Assignee

Comment 7

•

11 years ago

Comment on attachment 8460215 [details] [diff] [review] New approach Review of attachment 8460215 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/base/public/nsIBrowserSearchService.idl @@ +184,5 @@ > AString getResultDomain([optional] in AString responseType); > }; > > +[scriptable, uuid(856a31ff-b451-4101-b12e-ff399485ac8a)] > +interface nsISearchParseSubmissionResult : nsISupports I just wanted to note that, to address the use case from bug 1034382 of coloring the search terms, in a follow-up bug we could just add properties to this interface, for example containing the offset and length of the match in the original URL, or containing three sections of the URL of which the central one is equal to the encoded search terms.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 8

•

11 years ago

Comment on attachment 8460215 [details] [diff] [review] New approach >diff --git a/netwerk/base/public/nsIBrowserSearchService.idl b/netwerk/base/public/nsIBrowserSearchService.idl >+ /** >+ * The search engine associated with the submission, or null if the URL does >+ * not represent a search submission. >+ */ >+ readonly attribute nsISearchEngine engine; How about: "The search engine associated with the URL passed in to nsISearchEngine::parseSubmissionURL, ..." >diff --git a/toolkit/components/search/SearchStaticData.jsm b/toolkit/components/search/SearchStaticData.jsm >+ * This module is also easily overridable in case a hotfix is needed. How would this hotfix override work in practice? Have you tested this? >diff --git a/toolkit/components/search/nsSearchService.js b/toolkit/components/search/nsSearchService.js >+ _afterDataChanged: function () { This code would change if you use the approach I outline below, so I haven't reviewed this closely yet. >+const gEmptyParseSubmissionResult = new ParseSubmissionResult(null, ""); I have a slight concern that returning the same empty object for all empty results is a bit odd, but I suppose it's unlikely to matter in practice. >+ parseSubmissionURL: function SRCH_SVC_parseSubmissionURL(aURL) { Some higher-level comments about this: - As written, this will match the first available engine with a matching domain, which might not be desirable in some edge cases (e.g. if you have multiple Google engines installed, particularly if one is a hijacked version!). - It would be nice to avoid having to iterate across all engines twice, and the O(N•M) behavior of the alternateDomains.indexOf inside the second engine loop It seems like both of these could be addressed by having SearchStaticData instead contain a map of domain->default engine name. Then this method can just extract the domain from the URL and get the engine name (and thus nsISearchEngine) directly. What do you think? >diff --git a/toolkit/components/search/tests/xpcshell/data/engine-ad.xml b/toolkit/components/search/tests/xpcshell/data/engine-ad.xml nit: using "ad" is a bit confusing because it's also a word. Maybe pick another TLD that isn't?

Attachment #8460215 - Flags: review?(gavin.sharp) → feedback+

Marco Bonardo [:mak]

Updated

•

11 years ago

Blocks: 1036914

:Paolo Amadini

Assignee

Updated

•

11 years ago

Blocks: 1044577

:Paolo Amadini

Assignee

Comment 9

•

11 years ago

(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #8) > >diff --git a/toolkit/components/search/SearchStaticData.jsm b/toolkit/components/search/SearchStaticData.jsm > > >+ * This module is also easily overridable in case a hotfix is needed. > > How would this hotfix override work in practice? Have you tested this? I think it would quite easy to declare an override for the JSM in a manifest, but I haven't tested it. I've filed bug 1044577 to test this with a proof-of-concept add-on. > >+const gEmptyParseSubmissionResult = new ParseSubmissionResult(null, ""); > > I have a slight concern that returning the same empty object for all empty > results is a bit odd, but I suppose it's unlikely to matter in practice. It can be a good idea to freeze this, will do. > - As written, this will match the first available engine with a matching > domain, which might not be desirable in some edge cases (e.g. if you have > multiple Google engines installed, particularly if one is a hijacked > version!). I'm not familiar with the type of hijacking you are concerned about. The domain that is checked is always the target domain for the submission, and the URL path is checked, so processing engines in the order chosen by the user should assign them to the right engine. Engines whose main domain is an alternate domain of another engine will take precedence, but this is correct because this means the user installed a version of the same engine in another locale, to use it as a second language search engine on purpose. > - It would be nice to avoid having to iterate across all engines twice, and > the O(N•M) behavior of the alternateDomains.indexOf inside the second engine > loop Definitely, though to be fair in this O(N•M), M is > 0 only for a subset of N. > It seems like both of these could be addressed by having SearchStaticData > instead contain a map of domain->default engine name. Then this method can > just extract the domain from the URL and get the engine name (and thus > nsISearchEngine) directly. What do you think? I think a map can be a good idea, and can be easily built by joining the data from the available search engines with the static data. I think the static data should be limited to domain names only, because the engine names can vary across locales.

:Paolo Amadini

Assignee

Comment 10

•

11 years ago

Attached patch Using a map (obsolete) — Details — Splinter Review

Attachment #8460215 - Attachment is obsolete: true

Attachment #8463124 - Flags: review?(gavin.sharp)

:Paolo Amadini

Assignee

Updated

•

11 years ago

Blocks: 959582
No longer blocks: 1034381

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 11

•

11 years ago

Comment on attachment 8463124 [details] [diff] [review] Using a map Now I'm a little worried _parseSubmissionMap will get too big :) It's probably not significant, but I wonder: - does about:memory provide useful insight into its size? - can you reduce the size of the map by re-using a single object for each engine, rather than creating a new object for each invocation of processDomain? I.e.: for (let engine of this._sortedEngines) { ... let engineObj = { engine: engine, termsParamName: urlParsingInfo.termsParameterName }; let processDomain = (domain, isAlternate) => { ... engineObj.isAlternate = isAlternate; this._parseSubmissionMap.set(key, engineObj); ...

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 12

•

11 years ago

Comment on attachment 8463124 [details] [diff] [review] Using a map >diff --git a/toolkit/components/search/SearchStaticData.jsm b/toolkit/components/search/SearchStaticData.jsm >+ * This separate module is also easily overridable using an XPCOM manifest, in >+ * case a hotfix is needed. No high-level processing logic is applied here. I would remove "using an XPCOM manifest" (somewhat of a misnomer). >diff --git a/toolkit/components/search/nsSearchService.js b/toolkit/components/search/nsSearchService.js >+ _getTermsParameterName: function SRCH_EURL__getTermsParameterName() { >+ let queryParam = this.params.find(p => p.value == "{searchTerms}"); I suppose you could use USER_DEFINED here. >+ parseSubmissionURL: function SRCH_SVC_parseSubmissionURL(aURL) { >+ // Exclude any URL that is not HTTP or HTTPS from the beginning. >+ if (soughtUrl.scheme != "http" && soughtUrl.scheme != "https") { Would be useful to document this behavior in the IDL. >+ LOG("The value does not look like a structured URL."); >+ return gEmptyParseSubmissionResult; I wonder whether this method should throw a useful error rather than always returning an empty object - it would make debugging certain failures easier. r=me with these comments addressed.

Attachment #8463124 - Flags: review?(gavin.sharp) → review+

:Paolo Amadini

Assignee

Comment 13

•

11 years ago

(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #11) > - does about:memory provide useful insight into its size? > - can you reduce the size of the map by re-using a single object for each > engine, rather than creating a new object for each invocation of > processDomain? The memory increase from the map is still in the "tiny" group, seems to be between 15-25 KB. This is the nsSearchService.js memory difference after using the location bar dropdown the first time: ├──0.02 MB (00.27%) -- objects │ ├──0.01 MB (00.16%) ── malloc-heap/slots │ └──0.01 MB (00.10%) ++ gc-heap The optimization is still a good idea, and reduces this size to under 5 KB, I believe. ├──0.00 MB (00.02%) ++ objects

:Paolo Amadini

Assignee

Comment 14

•

11 years ago

(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #12) > >+ LOG("The value does not look like a structured URL."); > >+ return gEmptyParseSubmissionResult; > > I wonder whether this method should throw a useful error rather than always > returning an empty object - it would make debugging certain failures easier. Error objects and exceptions are expensive to generate and process. Since this is used in the location bar and the non-match exit path is the most common, I believe we should leave this code as it is. The JavaScript debugger in release builds, or the log in debug builds, can still give insight in case of non-obvious reasons for missed matches.

:Paolo Amadini

Assignee

Comment 15

•

11 years ago

Attached patch Final patch — Details — Splinter Review

This implements the optimization and adds the interface documentation. I'll land this version shortly if there aren't more comments.

Attachment #8463124 - Attachment is obsolete: true

:Paolo Amadini

Assignee

Comment 16

•

11 years ago

https://tbpl.mozilla.org/?tree=Try&rev=9f45cc6f451b

:Paolo Amadini

Assignee

Comment 17

•

11 years ago

Fixed typo in one test, rebased and pushed to fx-team: https://hg.mozilla.org/integration/fx-team/rev/482533cb9495

Ryan VanderMeulen [:RyanVM]

Comment 18

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/482533cb9495

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → Firefox 34

Marco Bonardo [:mak]

Updated

•

11 years ago

Blocks: 1047469

:Paolo Amadini

Assignee

Updated

•

11 years ago

Blocks: 1053357

Tim Taubert [:ttaubert] (inactive)

Updated

•

11 years ago

Blocks: 1069903

You need to log in before you can comment on or make changes to this bug.