Goodreads Search Engine returns blank page (UTF-8 params aren't correctly URL encoded)

NEW
Unassigned

Status

()

Firefox
Search
3 years ago
5 months ago

People

(Reporter: Ioana Chiorean, Unassigned)

Tracking

Trunk
ARM
Android
Points:
---

Firefox Tracking Flags

(firefox36 affected, firefox37 affected, firefox38 affected)

Details

Attachments

(2 attachments)

(Reporter)

Description

3 years ago
[Tracking Requested - why for this release]:

Firefox 37.0a2, 38.0a1, 36.0b1 
All devices

Steps: 
1. Go to goodreads.com and add the search engine ( near the bottom of the page).
2. Go to URL bar and search for an author/term with Goodreads search engine

Expected Results:
- a list of results should be displayed - consistent with the search in the page 

Actual Results:
- a blank page is displayed - the search is run with different UFT parameters
- added search engine: http://www.goodreads.com/search?utf8=%u2713&query=eminescu
- in page search: http://www.goodreads.com/search?utf8=%E2%9C%93&query=eminescu
Interesting. %E2%9C%93 is the URL encoded hex represenation of ✓. u2713 is I guess a "source code" hex (there's probably a real word for this--code point maybe?), e.g. in Python it would be u"\u2713".

Their opensearch.xml looks like so (http://www.goodreads.com/opensearch.xml):

<?xml version="1.0" encoding="UTF-8"?>
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
  <ShortName>Goodreads</ShortName>
  <Description>Search Goodreads.com</Description>
  <Tags>goodreads, books, reading</Tags>
  <Url type="text/html" method="GET" template="http://www.goodreads.com/search/search">
    <Param name="search_type" value="books"/>
    <Param name="search[query]" value="{searchTerms}"/>
  </Url>
  <LongName>Goodreads Search</LongName>
  <Image height="16" width="16" type="image/vnd.microsoft.icon">http://www.goodreads.com/favicon.ico</Image>
  <Query role="example" searchTerms="goodreads" />
  <Attribution>Search data copyright Goodreads, Inc</Attribution>
  <SyndicationRight>open</SyndicationRight>
  <AdultContent>false</AdultContent>
  <Language>en-us</Language>
  <OutputEncoding>UTF-8</OutputEncoding>
  <InputEncoding>UTF-8</InputEncoding>
</OpenSearchDescription>

Rails uses this utf8=✓ to detect IE bugs, IIRC. Could be that there's some kind of opensearch.xml -> actual search encoding issue.

Although, in Chrome desktop if I add the search engine the URL I see is "http://www.goodreads.com/search?utf8=%E2%9C%93&query=%s". In Firefox Desktop I get "http://www.goodreads.com/search?utf8=%u2713&query=foo" as well. Needs some more investigation--this may be a core issue.
Whiteboard: [country-all][notcontactready]
Component: Mobile → Search
Product: Tech Evangelism → Firefox
Moving to Firefox::Search since this affects Desktop + Mobile.
Summary: Goodreads Search Engine returns blank page → Goodreads Search Engine returns blank page (UTF-8 params aren't correctly URL encoded)
Created attachment 8550248 [details]
1121903.htm - demo
Probably not related to opensearch.xml (?) - just a matter of how Firefox registers the values of other form elements when creating a keyword search.

(This looks like an important bug for L10N. The search keyword functionality is likely broken in many locales that use non-ASCII encodings and include extra data in search forms.)
I'm confused about references to "add the search engine", OpenSearch, and keyword search. OpenSearch and keyword searches are entirely separate - which one is the problem here?

Can someone clarify the exact STR for experiencing the bug? Comment 0 is a bit low on detail about what exactly "add the search engine" entails, and which query term is used.
Oh, I see it now - it's the "add keyword for this search" that's an issue. OpenSearch is not related.
Component: Search → General
Thanks Gavin.

Here's some more clear STR:

1) navigate to http://www.goodreads.com/
2) Right-click in search input (near bottom of page), select Add a keyword for this search
3) Add keyword via dialog (I used "gr")
4) Use keyword to do a test search via URL bar: "gr fiction"

Expected: 
Search page results, similar to Chrome: http://www.goodreads.com/search?utf8=%E2%9C%93&query=fiction

Actual:
Blank page with following URL: http://www.goodreads.com/search?utf8=%u2713&query=fiction
Created attachment 8551974 [details] [diff] [review]
potential patch

This seems to fix this particular case, at least. The param encoding code for when the keyword is saved (https://hg.mozilla.org/mozilla-central/annotate/tip/browser/base/content/browser.js#l6650) differs slightly from the param encoding code for the search term when used (https://hg.mozilla.org/mozilla-central/annotate/tip/browser/base/content/browser.js#l2123), this patch (I think) makes them consistent.
(Reporter)

Comment 9

3 years ago
Gavin, Mike - just to clear this out the bug was initial for mobile and no keyword involved. 

Steps: 
1. Go to goodreads.com 
2. Scroll to the search bar ( near the bottom of the page).
3. Tap twice in the search bar so the input menu appears 
4. Tap the magnifying loop with + so you can Add the search engine
2. Go to URL bar and search for an author/term with Goodreads search engine

See video here: http://youtu.be/lZlNpdLWI-U
Attachment #8551974 - Flags: review?(mak77)
Assignee: nobody → gavin.sharp
Whiteboard: [country-all][notcontactready]
Comment on attachment 8551974 [details] [diff] [review]
potential patch

Review of attachment 8551974 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/browser.js
@@ +6653,2 @@
>      else
> +      return escapeFn(aName) + "=" + escapeFn(aValue);

I'm not sure this will work as expected due to this:

http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#2110
2109   if (postData)
2110     escapedPostData = unescape(postData);

Now, if you escape("✓"), you get %u2713 and unescape gives you ✓
But if you encodeURIComponent("✓") you get %E2%9C%93 and unescape gives you broken chars like ✓

But there is more in the keywords code that makes me mad (and sad), maybe you have an insight about these, cause I cannot figure it out:

1. I'm not sure the story of escape vs encodeURIComponent at http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#2118 is very sound, encodeURIComponent generates pairs that should not only be valid for UTF-8, it generates hex pairs valid for an rfc uri. An urlencoded form is not only valid in utf-8 and likely it uses encodeURIComponent...

2. I'm not sure why for url encoded forms we escape "name=value", while in the other cases we escape them separately. "=" and + are allowed, so the if/else here doesn't seem to make sense, it could always escape name and value separately. But even worse, why do we escape urlencoded post data that is already escaped? This completely puzzles me. It looks like we try to build some sort of "sane" representation to store it in a place that doesn't like "="(?).
The worst thing is that if we fix it, we break existing keywords.

3. since later we unescape postData that we escaped before (obtaining already-escaped-by-the-page postData), I wonder why we escaped it at all. Couldn't we store it in a ready-to-be-used state?

4. Maybe we could just use encodeURIComponent for GET urls and stop the crazy escape/encode mixup, but again, there's risk to break existing keywords :(

So, maybe I'm mad, but I think this code is a pile of nonsense.
It looks like it could just store postData as-is, encodeURIComponent anything else, and stop using escape. But, this thing is very hairy and I might be missing something, unfortunately who wrote this code originally is not with us anymore.
Attachment #8551974 - Flags: review?(mak77)
regardless what we do, it might be worth to extend (or dupe) browser_addKeywordSearch.js to cover this case.

Updated

3 years ago
Duplicate of this bug: 1127516
Thanks for diving into this (even just a little bit). I agree that this code is a mess, but I doubt we'll be able to untangle it any time soon.

I'm not sure I understand the potential issue you're describing - in my testing this patch worked, at least for the testcase in this bug. Are you suggesting that it wouldn't work for the testcase in this bug, or that it would break some other testcase?

We should definitely add a test.
Flags: needinfo?(mak77)

Comment 14

2 years ago
This happens with GitHub's search engine, too. Must be a Rails-wide issue.
Assignee: gavin.sharp → nobody
Component: General → Search
Flags: needinfo?(mak77)

Comment 15

5 months ago
I have this problem using Firefox 55.02(64 bit) on Ubuntu.

Comment 16

5 months ago
Here’s a workaround for users with this issue.

Edit the search bookmark keyword, usually by right-clicking it and selecting “Properties”.

The Location field should look something like this:

    https://example.com/search?utf8=%u2713&q=%s

The important part is the mangled “utf8=%u2713” parameter — the server expects “utf8=✓”, or it will refuse the search.

If you replace “utf8=%u2713” with “utf8=✓” in the location field, it should perform the search correctly. If that didn’t help, try removing the utf8 parameter entirely — which would produce something like this:

    https://example.com/search?q=%s
You need to log in before you can comment on or make changes to this bug.