Closed
Bug 606929
Opened 14 years ago
Closed 7 years ago
amend URIs only with user consent; amendments should be conservative
Categories
(Firefox :: Address Bar, defect)
Tracking
()
RESOLVED
INACTIVE
People
(Reporter: marginal, Unassigned)
References
()
Details
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.7) Gecko/20100202 Ant.com Toolbar 2.0.1 Shiretoko/3.5.7
Build Identifier:
By necessity, some characters in URI strings embedded in HTML or entered by a user require encoding prior to their use in requesting a resource. This is because the underlying network protocols allow only a restricted subset of character representations.
Users of Firefox are generally unaware of this limitation, and are not aware that any transliteration of their resource requests occurs. In many cases such transliterations are unambiguous, and result in Firefox correctly fetching the intended resource.
And this is usually desirable, since it is convenient that people should be able to specify a resource using human-language identifiers, which is enabled well using UTF8 text.
However, cases arise where the transliteration may be ambiguous. Identical UTF8 glyphs in many cases may result from more than one sequence of octet bytes, for example where a single glyph is made of combined base character and multiple inflection markings, each represented by seperate sequences of bytes.
Several different encoding standards exist within UTF8, and situations can arise where transliterating between these can be irreversible in the sense that once such a sequence of characters has been combined into a single glyph (which can easily occur due to the vast and complex support which the the UTF8 character table provides for human languages which were not the result of a design process that included tabulation as a goal) it is not possible to discern without further information whether the single glyph was originally a single glyph, or one of perhaps several sequences which should display identically.
Internet design guidelines specify to use an exact encoding sub-system of UTF8, but it is very likely that users will not always adhere to this. Furthermore, although internet servers might be designed to 'bridge the gap' in this regard, it seems likely that any such design would be open to a malicious or inadvertent subversion: where a user has supplied a resource using an inappropriate encoding, another user might supply a different resource and label it with a visually and semantically identical name, but with the correct encoding. It is reasonable to expect that any server would fetch the more 'correct' document, without necessarily alerting the user to the existence of another document.
Another specific example arises which is perhaps a bug within Firefox, or has some unexplained cause, which is that Firefox encodes the apostrophe character (0x27 ASCII) into the string "%27" without alerting the user, and against the RFC guidelines for HTTP and XML. This results in some cases in a complete inability to access resources which are provided by standards-compliant web servers.
Given the enormous complexity under a system based on URIs which might be made up of a combination of strings in punycode, 7-bit ASCII, 8-bit region encodings, and UTF8 with four different encoding systems, it is unlikely that all such bugs will be eliminated from Firefox. And yet since most users do not have technical knowledge in this area, and many users do not themselves use a variety of different languages and encoding systems, awareness of this source of difficulty is likely lacking, resulting in risk and confusion, as well as inconvenience and data loss or inconvenience.
Obstacles facing technical providers would also be reduced if this situation could be alleviated in any way.
Reproducible: Always
Steps to Reproduce:
1. Type text A into the location bar, but don't hit enter yet:
<http://www.xhaus.com/headers/test's fail>
2. Copy text B from the location bar and paste into a plain-text text editor.
3. Go back to the location bar, and hit enter so that the site is visited.
4. Note text C displayed in the "Requested URI" field of the page.
5. Note text D displayed visually in the Location Bar.
5. Copy text E from the location bar, and paste into your text editor.
Actual Results:
text A: http://www.xhaus.com/headers/test's fail
text B: http://www.xhaus.com/headers/test's fail
text C: /headers/test%27s%20fail
text D: http://www.xhaus.com/headers/test's fail
text E: http://www.xhaus.com/headers/test%27s%20fail
The example above uses a spurious test-page, but in the real world errors in this implementation result in loss of access to important resources, and probably result in organisations changing from use of Firefox to other browsers.
Expected Results:
text A: http://www.xhaus.com/headers/test's fail
text B: http://www.xhaus.com/headers/test's fail
At step 3, the user should be alerted that the provided URI is not able to be loaded without amendment. The amended URI should be displayed to the user.
If the user prefers, the user should not be notified in future, but such a preference should be reversible using the Preferences facility of Firefox, and the user should be so-informed if they make this choice.
text C: /headers/test's%20fail
text D: http://www.xhaus.com/headers/test's%20fail
text E: http://www.xhaus.com/headers/test's%20fail
The text C, D, and E discrepancies reflect the RFCs on the use of apostrophe in URIs, vs Firefox current behaviour, but this is used here merely as an example, there are separate bugreports relating to this topic, see
* https://bugzilla.mozilla.org/show_bug.cgi?id=434211
* https://bugzilla.mozilla.org/show_bug.cgi?id=407172
However it is indicative of the general principal that any mangling of URIs which is to be performed should tend toward less mangling, rather than more. User-preference settings associated with related functions should default to Firefox making less changes to its input, rather than more. And in any case where data is to be transformed in an unexpected or lossy manny, the default should be to alert the user verbosely, rather than to make such alterations silently.
Please see also separate bugreports requesting Location Bar features which could be useful in implementing this behaviour.
* https://bugzilla.mozilla.org/show_bug.cgi?id=606800
* https://bugzilla.mozilla.org/show_bug.cgi?id=606889
To understand the degree of complexity and some of the confusing issues which arise, I suggest parsing the following additional bugreports.
* https://bugzilla.mozilla.org/show_bug.cgi?id=84032
* https://bugzilla.mozilla.org/show_bug.cgi?id=412458
* https://bugzilla.mozilla.org/show_bug.cgi?id=407974
* https://bugzilla.mozilla.org/show_bug.cgi?id=408890
* https://bugzilla.mozilla.org/show_bug.cgi?id=124042
* https://bugzilla.mozilla.org/show_bug.cgi?id=494877
* https://bugzilla.mozilla.org/show_bug.cgi?id=434211
* https://bugzilla.mozilla.org/show_bug.cgi?id=407172
Note also that the 'Expected' text at D and E could equally well be
* http://www.xhaus.com/headers/test's fail
or perhaps even
* http://www.xhaus.com/headers/test%27s%20fail
depending on what is chosen as 'correct' outcome for the mangling process and how the user prefers their URIs to be displayed, but those are topic covered in the other bugreports mentioned above.
![]() |
||
Comment 2•14 years ago
|
||
Asking users whether to escape each character in a URI is a non-starter, I suspect; as you note most users (overwhelmingly; including the technical ones) have neither the knowledge nor the inclination for making such decisions. But such a decision would lie in the front-end anyway, not in the core networking code.
> This results in some cases in a complete inability to access resources which
> are provided by standards-compliant web servers.
If this is specifically referring to the %-encoding of apostrophe, then note that the standard changed since that was implemented (it went from allowed to not allowed), and this issue is already covered by other bugs, which you have clearly read, since you cited them.
Given the lack of "non-conservative" amendments cited other than the apostrophe one, sounds like this bug is just about warning users, so not a core bug.
Component: Networking → Location Bar
Product: Core → Firefox
QA Contact: networking → location.bar
![]() |
||
Comment 3•14 years ago
|
||
Note, by the way, that it should be quite possible to write an extension to give such a warning, for the very few users who care.
Boris,
I would not expect users to be warned of every character that is to be escaped. Just those that are not REQUIRED to be transliterated in order to send the request. This would include any reverse-escape which results in re-escaping to a different string, escaping of UTF8 strings that in the literal version provided do not comform to the UTF8 normalisation form recommended by the protocol being used, and cases exemplified by the apostrophe case.
I do not assume that the apostrophe case will be resolved any time soon, and I do assume that even if other similar cases do not exist presently, they may at some future time.
It is because of the very lack of interest and knowledge mentioned that I would like to see this feature implemented. As I mentioned at the outset, users have no idea of the fact that their input may be altered. If the alert is one which can be disabled, in the same way that many similar 'boring' alerts such as the http/https mixed frame page alerts in common browsers can be disabled, then the user can do that, and at least they have learnt not to rely on the browser to always fetch resources using the exact input they provide.
Unfortunately, as explained above, this is the new reality, given the complexity of human language, and the efficiency of UTF8 in reflecting the appearance of human language, at the expense of some redundancy in the specific encoding of so many cases.
This needs to be part of the browser itself, not an optional extension, because its purpose is to warn the user of these concerns. Since the user, as we both agree, in most cases is unaware and apathetic wrt this concern, then they are unaware and apathetic as to the need to employ the extension also.
In my own case, I am still developing my application, and not sure which way it will go, the bug may not actually affect my own work other than having caused me significant confusion, headache and stress.
But I am aware through my research that the apostrophe encoding issue has broken some complex web application software. I am not aware of what remedy if any was applied. In private networks, I would assume that such a breakage would mean simply mandating that another browser be used, since it is likely that in such situations significant complexity would have been built on top of the basic query system.
In a situation such as this, having the browser make the changes silently is dishonest, and puts the onus onto the administrators of the site to try to explain to the users what is going on. "Trust us." Firefox should be honest and up-front about what is going on, and admit that it may have faults, not use sleight-of-hand and hope that the user will be too confused to discover why just sometimes things don't work. "Oh the internet is not reliable, that's all. Get used to it." Well, true it is not always possible to make systems that cover every circumstance, but Firefox can at least give a clue in what direction to look for a resolution or workaround. Please?
And no, the apostrophe bug is not the only one, also linked was a NULL byte encoding bug similar to the apostrophe one in that it was an issue related to encoding and decoding operations applied to the URI, and would have an effect incomprehensible to an ordinary user without further explanation. "Oh, it's just broken. I'll do something else."
Other similar bugs will emerge from time to time. UTF8 -- supporting all living human scripts and many dead ones to boot -- is *hard*. ;-)
oops .. my explanation still is probably not 100% clear, sorry, it's very complex isn't it, and I am not remotely familiar with Firefox internals.
Please replace 'This would include' with 'and also'.
"Just those that are not REQUIRED to be transliterated in order to send the
request, and also any reverse-escape which results in re-escaping"
There may be other cases which should be included, and I thank you for helping to clarify, yes, a pop-upwarning for every conversion would be annoying even to those interested in the information in cases where it might matter. Firefox should not do that.
But a colour-coding of the URI would be a useful feature, showing that the URI displayed would not be sent unamended. The other option would be providing a way for the user to manually choose which version of the URI is displayed, as recommended in the other bugreports I filed recently,
* https://bugzilla.mozilla.org/show_bug.cgi?id=606800
* https://bugzilla.mozilla.org/show_bug.cgi?id=606889
Updated•14 years ago
|
Version: unspecified → 3.5 Branch
Comment 7•7 years ago
|
||
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•