Closed Bug 1121903 Opened 9 years ago Closed 10 months ago

Goodreads Search Engine returns blank page (UTF-8 params aren't correctly URL encoded)

Categories

(Firefox :: Search, defect, P3)

ARM
Android
defect

Tracking

()

VERIFIED FIXED
116 Branch
Tracking Status
firefox116 --- verified

People

(Reporter: ioana.chiorean, Assigned: daleharvey)

References

Details

(Whiteboard: [snt-scrubbed][search-tech-debt])

Attachments

(3 files)

[Tracking Requested - why for this release]:

Firefox 37.0a2, 38.0a1, 36.0b1 
All devices

Steps: 
1. Go to goodreads.com and add the search engine ( near the bottom of the page).
2. Go to URL bar and search for an author/term with Goodreads search engine

Expected Results:
- a list of results should be displayed - consistent with the search in the page 

Actual Results:
- a blank page is displayed - the search is run with different UFT parameters
- added search engine: http://www.goodreads.com/search?utf8=%u2713&query=eminescu
- in page search: http://www.goodreads.com/search?utf8=%E2%9C%93&query=eminescu
Interesting. %E2%9C%93 is the URL encoded hex represenation of ✓. u2713 is I guess a "source code" hex (there's probably a real word for this--code point maybe?), e.g. in Python it would be u"\u2713".

Their opensearch.xml looks like so (http://www.goodreads.com/opensearch.xml):

<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
  <ShortName>Goodreads</ShortName>
  <Description>Search Goodreads.com</Description>
  <Tags>goodreads, books, reading</Tags>
  <Url type="text/html" method="GET" template="http://www.goodreads.com/search/search">
    <Param name="search_type" value="books"/>
    <Param name="search[query]" value="{searchTerms}"/>
  </Url>
  <LongName>Goodreads Search</LongName>
  <Image height="16" width="16" type="image/vnd.microsoft.icon">http://www.goodreads.com/favicon.ico</Image>
  <Query role="example" searchTerms="goodreads" />
  <Attribution>Search data copyright Goodreads, Inc</Attribution>
  <SyndicationRight>open</SyndicationRight>
  <AdultContent>false</AdultContent>
  <Language>en-us</Language>
  <OutputEncoding>UTF-8</OutputEncoding>
  <InputEncoding>UTF-8</InputEncoding>
</OpenSearchDescription>

Rails uses this utf8=✓ to detect IE bugs, IIRC. Could be that there's some kind of opensearch.xml -> actual search encoding issue.

Although, in Chrome desktop if I add the search engine the URL I see is "http://www.goodreads.com/search?utf8=%E2%9C%93&query=%s". In Firefox Desktop I get "http://www.goodreads.com/search?utf8=%u2713&query=foo" as well. Needs some more investigation--this may be a core issue.
Whiteboard: [country-all][notcontactready]
Component: Mobile → Search
Product: Tech Evangelism → Firefox
Moving to Firefox::Search since this affects Desktop + Mobile.
Summary: Goodreads Search Engine returns blank page → Goodreads Search Engine returns blank page (UTF-8 params aren't correctly URL encoded)
Probably not related to opensearch.xml (?) - just a matter of how Firefox registers the values of other form elements when creating a keyword search.

(This looks like an important bug for L10N. The search keyword functionality is likely broken in many locales that use non-ASCII encodings and include extra data in search forms.)
I'm confused about references to "add the search engine", OpenSearch, and keyword search. OpenSearch and keyword searches are entirely separate - which one is the problem here?

Can someone clarify the exact STR for experiencing the bug? Comment 0 is a bit low on detail about what exactly "add the search engine" entails, and which query term is used.
Oh, I see it now - it's the "add keyword for this search" that's an issue. OpenSearch is not related.
Component: Search → General
Thanks Gavin.

Here's some more clear STR:

1) navigate to http://www.goodreads.com/
2) Right-click in search input (near bottom of page), select Add a keyword for this search
3) Add keyword via dialog (I used "gr")
4) Use keyword to do a test search via URL bar: "gr fiction"

Expected: 
Search page results, similar to Chrome: http://www.goodreads.com/search?utf8=%E2%9C%93&query=fiction

Actual:
Blank page with following URL: http://www.goodreads.com/search?utf8=%u2713&query=fiction
Attached patch potential patchSplinter Review
This seems to fix this particular case, at least. The param encoding code for when the keyword is saved (https://hg.mozilla.org/mozilla-central/annotate/tip/browser/base/content/browser.js#l6650) differs slightly from the param encoding code for the search term when used (https://hg.mozilla.org/mozilla-central/annotate/tip/browser/base/content/browser.js#l2123), this patch (I think) makes them consistent.
Gavin, Mike - just to clear this out the bug was initial for mobile and no keyword involved. 

Steps: 
1. Go to goodreads.com 
2. Scroll to the search bar ( near the bottom of the page).
3. Tap twice in the search bar so the input menu appears 
4. Tap the magnifying loop with + so you can Add the search engine
2. Go to URL bar and search for an author/term with Goodreads search engine

See video here: http://youtu.be/lZlNpdLWI-U
Assignee: nobody → gavin.sharp
Whiteboard: [country-all][notcontactready]
Comment on attachment 8551974 [details] [diff] [review]
potential patch

Review of attachment 8551974 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/browser.js
@@ +6653,2 @@
>      else
> +      return escapeFn(aName) + "=" + escapeFn(aValue);

I'm not sure this will work as expected due to this:

http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#2110
2109   if (postData)
2110     escapedPostData = unescape(postData);

Now, if you escape("✓"), you get %u2713 and unescape gives you ✓
But if you encodeURIComponent("✓") you get %E2%9C%93 and unescape gives you broken chars like â

But there is more in the keywords code that makes me mad (and sad), maybe you have an insight about these, cause I cannot figure it out:

1. I'm not sure the story of escape vs encodeURIComponent at http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#2118 is very sound, encodeURIComponent generates pairs that should not only be valid for UTF-8, it generates hex pairs valid for an rfc uri. An urlencoded form is not only valid in utf-8 and likely it uses encodeURIComponent...

2. I'm not sure why for url encoded forms we escape "name=value", while in the other cases we escape them separately. "=" and + are allowed, so the if/else here doesn't seem to make sense, it could always escape name and value separately. But even worse, why do we escape urlencoded post data that is already escaped? This completely puzzles me. It looks like we try to build some sort of "sane" representation to store it in a place that doesn't like "="(?).
The worst thing is that if we fix it, we break existing keywords.

3. since later we unescape postData that we escaped before (obtaining already-escaped-by-the-page postData), I wonder why we escaped it at all. Couldn't we store it in a ready-to-be-used state?

4. Maybe we could just use encodeURIComponent for GET urls and stop the crazy escape/encode mixup, but again, there's risk to break existing keywords :(

So, maybe I'm mad, but I think this code is a pile of nonsense.
It looks like it could just store postData as-is, encodeURIComponent anything else, and stop using escape. But, this thing is very hairy and I might be missing something, unfortunately who wrote this code originally is not with us anymore.
Attachment #8551974 - Flags: review?(mak77)
regardless what we do, it might be worth to extend (or dupe) browser_addKeywordSearch.js to cover this case.
Thanks for diving into this (even just a little bit). I agree that this code is a mess, but I doubt we'll be able to untangle it any time soon.

I'm not sure I understand the potential issue you're describing - in my testing this patch worked, at least for the testcase in this bug. Are you suggesting that it wouldn't work for the testcase in this bug, or that it would break some other testcase?

We should definitely add a test.
Flags: needinfo?(mak77)
This happens with GitHub's search engine, too. Must be a Rails-wide issue.
Assignee: gavin.sharp → nobody
Component: General → Search
Flags: needinfo?(mak77)
I have this problem using Firefox 55.02(64 bit) on Ubuntu.
Here’s a workaround for users with this issue.

Edit the search bookmark keyword, usually by right-clicking it and selecting “Properties”.

The Location field should look something like this:

    https://example.com/search?utf8=%u2713&q=%s

The important part is the mangled “utf8=%u2713” parameter — the server expects “utf8=✓”, or it will refuse the search.

If you replace “utf8=%u2713” with “utf8=✓” in the location field, it should perform the search correctly. If that didn’t help, try removing the utf8 parameter entirely — which would produce something like this:

    https://example.com/search?q=%s
Priority: -- → P2
Priority: P2 → P3
Whiteboard: [snt-scrubbed][search-tech-debt]

Since there was already a WIP patch for this bug, we want to continue that and verify what we're receiving in the get and post cases. So there's no further regressions.

Assignee: nobody → dharvey
Pushed by dharvey@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8ea86587da61
Use encodeURIComponent for keyword url encoding. r=mak
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 116 Branch

Reproducible on a 2023-06-21 Nightly build on Windows 10 using the STR from Comment 7.
Verified as fixed on Nightly 116.0a1(build ID: 20230626215144) on Windows 10, macOS 12, Ubuntu 22.
After submitting the search the results page is displayed properly.

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: