817374 - URIs in places DB and created by nsIIOService.newURI() are not normalized wrt character escaping

Reporter

Description

•

12 years ago

nsIIOService.newURI() produces a nsIURI with special (e.g. UTF-8) characters escaped with uppercase A-F (e.g. %7E). However, the browser history and bookmarks use lowercase escapes (e.g. %7e). This causes a query from the history or bookmarks to not return any matches for the nsIURI, even if the page in question is visited/bookmarked. Character escaping should be case-consistent in all browser components to prevent this problem.

Boris Zbarsky [:bzbarsky]

Comment 1

•

12 years ago

> However, the browser history and bookmarks use lowercase escapes How are the managing that? Necko uses uppercase escapes, as you noted. encodeURI and encodeURIComponent use uppercase escapes. So how is Places ending up with lowercase escapes, exactly? Is it rolling its own escaping?

Component: General → Places

Product: Core → Toolkit

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 2

•

12 years ago

(In reply to jha from comment #0) > However, the browser history and bookmarks use lowercase escapes (e.g. %7e). I don't understand what this means either. I don't know of any escaping code specific to history/bookmarks. jha, do you have steps to reproduce the issue?

jha

Reporter

Comment 3

•

12 years ago

Sorry, my bad, this may or may not be a bug after all. I ran into this while trying to figure out why my Link Status Redux add-on does not show some visited links as visited. Turns out that the links in question were already escaped in lowercase. The add-on gets the link URL by replacing XULBrowserWindow.setOverLink. This URL has been unescaped (to produce more readable overlink text), and when it gets escaped again by feeding it to nsIIOService.newURI() the escapes are in uppercase and the URI won't match in a history query. I could work around this by converting the escapes to lowercase so I just assumed (yeah, I know ;) that there was some escape case-inconsistency (when bookmarked, those links that didn't work correctly showed as lowercase-escaped when hovering over the bookmark and worked just fine). Now the question that remains is this: if a link specifies a href of "foo%7ebar", which when displayed is converted to "foo~bar", which in turn is escaped to "foo%7Ebar", should a places query return the "foo%7ebar" as a result when queried for "foo%7Ebar"? (After all, if I understand correctly, %7E and %7e in URLs mean the same thing.) If not (and I admit this is not a very critical issue), then this bug can be closed as invalid.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 4

•

12 years ago

Are places history queries case-sensitive by default? How are you querying places exactly?

jha

Reporter

Comment 5

•

12 years ago

> Are places history queries case-sensitive by default? Apparently, and they should be, as the path component is case-sensitive (excluding the escapes I guess), /FOO is a different page from /foo. (Whereas the host component is case-insensitive; is it converted to lowercase when storing to places db?). If the places db is supposed to store "normalized" URLs (is it?), then maybe any existing escapes should be converted to uppercase before storing to the db? For example, my workaround of lowercasing the escapes in the query URI won't work if the original link is only partially escaped or escaped in mixed case i.e. something like "foo%7e%7Ebar" is stored in the db. > How are you querying places exactly? Like this (the workaround not shown here): var iosvc = Components.classes["@mozilla.org/network/io-service;1"].getService(Components.interfaces.nsIIOService); var histsvc = Components.classes["@mozilla.org/browser/nav-history-service;1"].getService(Components.interfaces.nsIGlobalHistory2); var uri = iosvc.newURI(link, null, null); if (histsvc.isVisited(uri)) { // to get the visit date histsvc.QueryInterface(Components.interfaces.nsINavHistoryService); var query = histsvc.getNewQuery(); query.uri = uri; var queryOptions = histsvc.getNewQueryOptions(); queryOptions.includeHidden = true; queryOptions.maxResults = 1; var results = histsvc.executeQuery(query, queryOptions).root; ... } isVisited() will return false for "foo%7Ebar" if the link href linking to the page that was visited was "foo%7ebar"

jha

Reporter

Updated

•

12 years ago

Summary: nsIIOService.newURI() escapes special characters with uppercase A-F → URIs in places DB and created by nsIIOService.newURI() are not normalized wrt character escaping

jha

Reporter

Comment 6

•

12 years ago

Attached file test cases for the problem — Details

jha

Reporter

Comment 7

•

12 years ago

OK, here's the correct description of the problem, also updated the title to reflect this: URIs stored in the places DB and those produced by nsIIOService.newURI() (which the places DB possibly uses) are not normalized (unambiguous) with regards to escaped characters. This causes problems when querying the places DB for an URI that is originally in unescaped form, which will be escaped with uppercase A-F by nsIIOService.newURI(). There are two aspects of the problem: 1. Escaped special characters already present in the URI are left as-is and not ensured to be uppercase. If a visited link is "%c3%84" (UTF-8), it is stored unchanged into the DB. If you refer to this in the unescaped form "Ä", which is escaped to "%C3%84" by nsIIOService.newURI(), you won't get any matches from a history query. This can be worked around by creating a second URI by lowercasing all escapes in the uri.spec and querying also that URI, but this won't help if the entry in the DB contains mixed-case escapes. 2. Escaped non-special charaters in the URI are not unescaped. For example the tilde is such a character. If a visited link is "%7E", the unescaped form "~" is not escaped by nsIIOService.newURI() and thus won't be found in a history query. Whether this behavior is a bug or not depends on whether nsIIOService.newURI() is supposed to create normalized URIs and whether URIs stored in the places DB are supposed to be normalized. If so, a solution would be to first unescape and then re-escape the URI before storing it to the DB (I don't known whether there would be any problems with this approach, though). I have attached a HTML file containing test cases for the problem.

jha

Reporter

Updated

•

12 years ago

Attachment #688173 - Attachment mime type: text/plain → text/html

test cases for the problem 12 years ago jha 1.05 KB, text/html		Details
Bug 817374 - Fix inconsistency of the setOverLink parameter. 8 years ago Masatoshi Kimura [:emk] 58 bytes, text/x-review-board-request	mossop : review+	Details