Closed Bug 408890 Opened 17 years ago Closed 6 years ago

URL is not escaped when a part is copied, but is escaped when the whole URL is copied

Categories

(Firefox :: Address Bar, defect)

defect
Not set
normal

Tracking

()

RESOLVED INACTIVE

People

(Reporter: Aleksej, Unassigned)

References

()

Details

(Keywords: intl)

Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9b3pre) Gecko/2007121804 Minefield/3.0b3pre

1. Go to any URL with non-ASCII characters, like <http://localhost/тест> or <http://ru.wikipedia.org/> (in the latter case, you should be redirected to <http://ru.wikipedia.org/wiki/Заглавная_страница>).
2a Try copying the whole URL from the location bar
2b Try copying any parts of the URL.

Actual results:
2a. The non-ASCII part is encoded like …%D0%97%D0%B0%D0%B3%D…
2b. Any part of the URL you copy, unless it is the whole URL, is not encoded in the clipboard, and shows as in step 1 when pasted into a text editor.

Not sure it’s really a bug, but it is not terribly useful, either.
Blocks: 366797
It's intended. See bug 105909 comment 43 and the following comments.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → INVALID
(In reply to comment #1)
> It's intended. See bug 105909 comment 43 and the following comments.
> 

Reopening.  It's really confusing that selecting the whole URL gets you an escaped version, but selecting only part gets you an unescaped version.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
That you find it confusing doesn't mean it's wrong. It's like that for a reason, and there are no "expected results" in comment 0 either. So at this point this bug needs more input in order to make sense.
I believe that the expected results are that what's copied out of the location bar always looks like what's displayed in the location bar. Put another way:

Actual results:
  Copying entire URL results in http://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F
  Copying partial URL results in ttp://ru.wikipedia.org/wiki/Заглавная

Expected results:
  Copying entire URL results in http://ru.wikipedia.org/wiki/Заглавная
  Copying partial URL results in ttp://ru.wikipedia.org/wiki/Заглавная

I took a look at bug 105909 comment 43 and further, and as far as I can tell (please correct me if I'm wrong!) we're optimizing for a case where someone pastes a URL into an IRC client and that breaks the person who then wants to click on it? It sounds to me like decoding the ASCII input from the IRC client is the better way to go there.

I also thought that clipboard behaviour allowed us to copy different versions of content so that depending on what a target application asked for on paste, we'd either give them rich text, basic text, etc. Perhaps we should be copying escaped ASCII into the basic text buffer, but ASCII into the richer text buffers? That way we could let the client decide.

Regardless, the above is inconsistent behaviour, and without reading that bug, it's pretty hard for a user to understand why there's a difference. I support this bug staying OPEN, since while the behaviour may be intentional, that doesn't mean that it's right. :)
(In reply to comment #4)
> I took a look at bug 105909 comment 43 and further, and as far as I can tell
> (please correct me if I'm wrong!) we're optimizing for a case where someone
> pastes a URL into an IRC client and that breaks the person who then wants to
> click on it? It sounds to me like decoding the ASCII input from the IRC client
> is the better way to go there.

The message was that (some?) IRC clients don't _linkify_ unicode URLs. And this isn't limited to IRC clients at all. Bugzilla has the same problem, as you can see in your comment:

> Expected results:
>   Copying entire URL results in http://ru.wikipedia.org/wiki/Заглавная
(In reply to comment #5)
The development installation of Bugzilla has that fixed.
(In reply to comment #4)
> I believe that the expected results are that what's copied out of the location
> bar always looks like what's displayed in the location bar.

I’d say “consistency”, but yes, since I thought it was a Search bar “bug” at first.
(In reply to comment #6)
> The development installation of Bugzilla has that fixed.

Which doesn't really mean anything for compatibility with various applications at large...
(In reply to comment #4)
> I also thought that clipboard behaviour allowed us to copy different versions
> of content so that depending on what a target application asked for on paste,
> we'd either give them rich text, basic text, etc. Perhaps we should be copying
> escaped ASCII into the basic text buffer, but ASCII into the richer text
> buffers? That way we could let the client decide.

That's what we call "flavor" (or sometimes "flavour").
http://www.mozilla.org/xpfe/xptoolkit/DataFlavors.html

Actually, few application, if any, will never even try to listen
to antything other than "text/unicode" flavor, that's not a
solution between chat clients and the new URL bar.

In the long run, maybe we should do a kind of evangelism action
to spread, say, e.g. mime type "text/uri-list" which is defined
in RFC 2483, as a standard flavor for copy and paste URI with
specific encodings. However, we need another way, now.

Jesse said in bug 105909
>We should make sure power users have access to the escaped URL. 
>Perhaps copying from the address bar should automatically
>use the escaped version.

I'm totally agree that we should make sure power users have access to
the escaped URL. But I'm not too sure why copy and paste is the
access for power users. I'd like to make it clear for whom copy and
paste is. Power users or normal users? In other words, we should make
sure normal users have access to the unescaped URL.

In my opinion, we need another way to provide escaped string. How about
a new UI to switch escaped/unescaped ?
(In reply to comment #8)
> (In reply to comment #4)
> > I also thought that clipboard behaviour allowed us to copy different versions
> > of content so that depending on what a target application asked for on paste,
> > we'd either give them rich text, basic text, etc. Perhaps we should be copying
> > escaped ASCII into the basic text buffer, but ASCII into the richer text
> > buffers? That way we could let the client decide.
> 
> That's what we call "flavor" (or sometimes "flavour").
> http://www.mozilla.org/xpfe/xptoolkit/DataFlavors.html
> 
> Actually, few application, if any, will never even try to listen
> to antything other than "text/unicode" flavor, that's not a
> solution between chat clients and the new URL bar.

That's only remotely related to this bug. An application accepting unicode strings won't necessarily present these strings as URLs. Also, this is about both desktop and Web applications. Being able to paste unicode URLs in Firefox doesn't mean that random Web apps will like this.

> Jesse said in bug 105909
> >We should make sure power users have access to the escaped URL. 
> >Perhaps copying from the address bar should automatically
> >use the escaped version.
> 
> I'm totally agree that we should make sure power users have access to
> the escaped URL. But I'm not too sure why copy and paste is the
> access for power users. I'd like to make it clear for whom copy and
> paste is. Power users or normal users? In other words, we should make
> sure normal users have access to the unescaped URL.

I doubt that normal users know what unicode or escaping means, so giving them a choice is pointless.
(In reply to comment #9)
> I doubt that normal users know what unicode or escaping means,

That's the whole point of this bug. They don't know escaped URL, and
they just complain about it. i.e. This is a terrible user experience.
The problem is a copied string does not match the selected string
in a textbox. This is a UI revolution. None of them has experienced that
ever. Probably even authors of UI guidelines have never assumed there
could exist such an application. I don't think any software designer
is allowed to change this basic behavior of "Copy", even when it's
technically possible.

> so giving them a choice is pointless.
It's not pointless at all to give a choice to power users, who know
merits of escaping. Most of users would be happy with Unicode.

Just add "Copy Location" item to the menu (either on favicon, textfield
or tab), and leave "Copy" as that of a *normal* textbox.
(In reply to comment #10)
> It's not pointless at all to give a choice to power users, who know
> merits of escaping.

We don't want to break pasting URLs into third-party apps for non-power users.

> Most of users would be happy with Unicode.

We're talking about interoperability. Having a readable address is secondary if the primary use case, following the link, breaks. See bug 105909 comment 43.
(In reply to comment #11)
> We don't want to break pasting URLs into third-party apps for non-power users.
I don't want, either.

> We're talking about interoperability. Having a readable address
> is secondary if the primary use case, following the link, breaks. 
And I'm talking about how to meet both interoperability and UI consistency
at the same time. 

> See bug 105909 comment 43.
I wonder why you though I hadn't read 105909 comment 43. Please see #8.

I'm quite sure I understand your points. And I'd like to know why you
refuse to discuss UI design. Is there anything other than interoperability?

Jesse, your opinion?
(In reply to comment #12)
> > We're talking about interoperability. Having a readable address
> > is secondary if the primary use case, following the link, breaks. 
> And I'm talking about how to meet both interoperability and UI consistency
> at the same time. 

A "Copy Location" command that works differently from "Copy" isn't consistent either, especially since the difference is neither self-explanatory nor easy to discover even for experienced users, and not understandable for most users.

> > See bug 105909 comment 43.
> I wonder why you though I hadn't read 105909 comment 43. Please see #8.

Because you keep saying normal copying should default to the unescaped address, which would break pasting actions.

Note that many users don't know what to do with addresses that aren't fully linkified, so this is actually important.

And we haven't even started to think about what else could stop working if an application is designed to process escaped addresses but gets unescaped ones.
Or regarding Web applications, let's forget URL processing for a moment. What happens if a Web app or a simple page with a comment form works with a limited charset and you submit unicode text?

> I'm quite sure I understand your points. And I'd like to know why you
> refuse to discuss UI design. Is there anything other than interoperability?

I don't think there was a reasonable UI proposal so far.
(In reply to comment #13)
> A "Copy Location" command that works differently from "Copy" isn't consistent
> either, especially since the difference is neither self-explanatory nor easy to
> discover even for experienced users, and not understandable for most users.

So you mean "Copy Link Location" "Copy Image Location" in the context etc.
is not consistent either? That is a interesting opinion. If you like
"self-explanatory" just change the name from "Copy Lcation" from "Copy 
escaped URL" or anything you like. Probably beltzner should know the
best label for such menuitems.

If you don't mind, can you give me some comment on human interface design?
By "inconsistent", I mean:
>The problem is a copied string does not match the selected string
>in a textbox. This is a UI revolution. None of them has experienced that
>ever. Probably even authors of UI guidelines have never assumed there
>could exist such an application. I don't think any software designer
>is allowed to change this basic behavior of "Copy", even when it's
>technically possible.

Don't mix the inconsistency of "Copy" and that of "Copy Link URL".
(In reply to comment #14)
> (In reply to comment #13)
> > A "Copy Location" command that works differently from "Copy" isn't consistent
> > either, especially since the difference is neither self-explanatory nor easy to
> > discover even for experienced users, and not understandable for most users.
> 
> So you mean "Copy Link Location" "Copy Image Location" in the context etc.
> is not consistent either? That is a interesting opinion.

No, I don't mean that.
You propose to have two copy commands for one text field, which is super confusing, and doesn't solve the problem that one of them (the default one as you propose) breaks third-party apps.

> If you like
> "self-explanatory" just change the name from "Copy Lcation" from "Copy 
> escaped URL" or anything you like. Probably beltzner should know the
> best label for such menuitems.

Most users don't know what escaped URLs are, and they can't judge whether not they need them.

> If you don't mind, can you give me some comment on human interface design?
> By "inconsistent", I mean:

What I meant is that a second copy command that's labeled "Copy Location" can't be expected to behave differently from the Location(!) Bar's "Copy" command.
(In reply to comment #15)
> You propose to have two copy commands for one text field, which is super
> confusing, and doesn't solve the problem that one of them 
So you mean "Copy Link URL" menuitem on the *content* contextmenu is super
confusing, in the end. Otherwise it's fairy consistent in Firefox.

>(the default one as you propose) breaks third-party apps.
Nothing break third party apps, unless you stop providing escaped URLs. 

> What I meant is that a second copy command that's labeled "Copy Location" can't
> be expected to behave differently from the Location(!) Bar's "Copy" command.
Then, probably we should give up providing a escaped URL via URL bar.
URL bar is not the only source of a escaped URL.

Anyway you don't understand "consistency" and "inconsistency". 
Please read UI guidelines before starting UI discussion.
Gnome:
http://library.gnome.org/devel/hig-book/stable/principles-consistency.html.en
Apple:
http://developer.apple.com/documentation/UserExperience/Conceptual/OSXHIGuidelines/XHIGHIDesign/chapter_5_section_2.html#//apple_ref/doc/uid/TP30000353-TPXREF103

You should not break thrid party apps, as well as you should not
break Text selection and "Copy" mechanism. 

In short, the requests are 
[1] Don't change the behavior of "Copy". 
[2] Provide a way to access escaped URL.

[2] is NOT required only on the location bar.
[1] is required within application-wide range, especially on textbox.
(In reply to comment #16)
> (In reply to comment #15)
> > You propose to have two copy commands for one text field, which is super
> > confusing, and doesn't solve the problem that one of them 
> So you mean "Copy Link URL" menuitem on the *content* contextmenu is super
> confusing, in the end.

No, I don't even think it's related to what we're discussing. Contrary to your proposal there aren't two copy commands for link locations.

I'll stop here because we're going round in circles.
(In reply to comment #4)
> I believe that the expected results are that what's copied out of the location
> bar always looks like what's displayed in the location bar.

Yep, this is pretty simple. Safari 3 on OS X behaves this way. What is the behavior of other browsers? 
This has the annoying side effect that using the location bar to add commentary to a link before copying it escapes all the spaces etc.

For example, when I am at http://example.com/ and I want to send that address to a friend:
1) Go to the location bar
2) Append " --> you should check this out, especially the BarQuux section"
3) Select all and copy
4) Paste in other app

Result: 
http://example.com/%20--%3E%20you%20should%20check%20this%20out,%20especially%20the%20BarQuux%20section

Also, it escapes anything that looks like it starts with a protocol identifier. For example:
1) Go to the location bar
2) Overwrite with "hey: don't forget to do bla"
3) Select all and copy
4) Paste in other app

Result:
hey:%20don't%20forget%20to%20do%20bla

If you change the colon to a comma before copying, the result is:
hey, don't forget to do bla

It actually took me a while to figure this out, and I doubt any normal user would get the reason.
(In reply to comment #19)
The Location Bar isn't a random text area, so this seems secondary. It could be fixed by taking pageproxystate into account, though. Feel free to file a bug and cc me.
I would also think consistency and usefulness is at least as important as not breaking third party applications. Also if you can see and understand the unicode text you expect the person who you are communicating with has the same ability (I write an email to a person in Malayalam only when I know the person who receive it can read and understand it). Also even if the pasted  text is not linkified it will work if you copy it and paste it to the location bar. About other web applications, how can you expect a site to accept unicode if they can't handle it? It is very useful to be able to read the url even though it is linkified as you can copy and paste it or you could even type it. But with the escaped URLs it is even a security risk as you have no way of knowing what you are actually clicking.
(In reply to comment #18)
> (In reply to comment #4)
> > I believe that the expected results are that what's copied out of the location
> > bar always looks like what's displayed in the location bar.
> 
> Yep, this is pretty simple. Safari 3 on OS X behaves this way. What is the
> behavior of other browsers?

Safari 3 on Windows Vista and IE7 on Windows Vista also behave this way.

Note on IE7: If you go to <http://ru.wikipedia.org/>, you are redirected to <http://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0>, so it's a moot point as to what is copied out of the address bar. But if you go to <http://ru.wikipedia.org/wiki/Заглавная_страница> directly it functions like Safari 3.
I don't really care what browser X is doing. Chances are that they didn't even think about it like we are. There's a real-world example in bug 105909 comment 43, there are (unintended!) real-world examples within this very bug, and "let's do what the others are doing" doesn't address this.
(In reply to comment #23)
> I don't really care what browser X is doing. Chances are that they didn't even
> think about it like we are. There's a real-world example in bug 105909 comment
> 43, there are (unintended!) real-world examples within this very bug, and
> "let's do what the others are doing" doesn't address this.
> 

This argument is most unconvincing. Change "browser X" to "default OS browsers" and re-evaluate.
IE has a number of annoyances that we shouldn't copy and Safari isn't perfect either. Like I said, chances are that they just didn't think about this. So I don't see why their behavior matters.
(In reply to comment #25)
> Like I said, chances are that they just didn't think about this. So I
> don't see why their behavior matters.
> 

The beauty of my metric is that their developement processes don't matter--only their results do0. Apps will have to deal with the default browser no matter what we do, and the escaped URLs suck, so let's fall in line.

This bug is assigned to nobody, so I'm taking and fixing. I'll ask for review in a bit.
Assignee: nobody → sayrer
Status: REOPENED → NEW
(In reply to comment #26)
> Apps will have to deal with the default browser no matter what we do

Obviously, they don't.
(In reply to comment #27)
> (In reply to comment #26)
> > Apps will have to deal with the default browser no matter what we do
> 
> Obviously, they don't.
> 

Sorry, one buggy IRC client is not a real world example. :)
It's not? What would you consider a real-world example?
OS: Linux → All
Hardware: PC → All
Btw, applications like that IRC client or Bugzilla aren't necessarily broken, they act(ed) correctly according to an outdated standard. E.g. "Octets must be encoded if they have no corresponding graphic character within the US-ASCII coded character set", RFC 1738.
If you fix this, what about urls with space? Those will break for sure, no way applications can auto-link them. Unless there is some special treatment - no idea what other browsers do there. (Quite common especially for regular documents just made available on the web.)
Wikis use '_' for space in URLs and escaped characters for space is already used in asci URLs and the issue here is about normal characters also showed as escaped sequences.
(In reply to comment #22)
> 
> Safari 3 on Windows Vista and IE7 on Windows Vista also behave this way.
> 
> Note on IE7: If you go to <http://ru.wikipedia.org/>, you are redirected to
> <http://ru.wikipedia.org/wiki/%D0%97%D0%B0%D0%B3%D0%BB%D0%B0%D0%B2%D0%BD%D0%B0%D1%8F_%D1%81%D1%82%D1%80%D0%B0%D0%BD%D0%B8%D1%86%D0%B0>,
> so it's a moot point as to what is copied out of the address bar. But if you go
> to <http://ru.wikipedia.org/wiki/Заглавная_страница> directly
> it functions like Safari 3.

And opera works fine. You get
<http://ru.wikipedia.org/wiki/Заглавная_страница>

(In reply to comment #33)
> Wikis use '_' for space in URLs and escaped characters for space is already
> used in asci URLs and the issue here is about normal characters also showed as
> escaped sequences.
>

I think it's a very wiki(server-side) dependent.

(In reply to comment #32)
> If you fix this, what about urls with space? Those will break for sure, no way
> applications can auto-link them. Unless there is some special treatment - no
> idea what other browsers do there. (Quite common especially for regular
> documents just made available on the web.)

The spaces can be escaped. Somewhere such method is really used (and I think I saw such bug closed, maybe even you closed it).
https://bugzilla.mozilla.org/show_bug.cgi?id=105909#c43:
I don't see the difference: I may open both links. Or if not just to use copy-paste (in very rare cases). But the most part of users complains and power users may use another things to get the url escaped (that's why they're power).

Also I got a good example from sp3000 during the talk on #developers.
We don't escape/unescape “#”, and it may break some applications too (some interpeters may think it's a comment, but not the part of the link). 
Also “#” breaks the links in IRC too.
"#" just isn't relevant here because it may not be escaped by the rules of this mechanism; it's irrelevant that it may happen to be problematic for some language medium if not treated properly by the rules of that language, or for some irc client.
I think web browser does not have ANY right to interfere with URL. It doesn't know anything about web server that would accept the URL (how it will parse it, etc). Changing URL could possibly change server's response to it. When I click "Copy link", I expect the _link_ to be copied, not some obscure modification of it done entirely by browser's initiative which is completely uncalled for. Browser shouldn't do things not asked to do.
URL is not browser's property, so DON'T TOUCH IT please!
http://www.w3.org/TR/html401/appendix/notes.html#non-ascii-chars defines the way to treat Unicode URIs:

We recommend that user agents adopt the following convention for handling non-ASCII characters in such cases:

   1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
   2. Escape these bytes with the URI escaping mechanism (i.e., by converting each byte to %HH, where HH is the hexadecimal notation of the byte value).
...

So it is a trouble of UA to decode Unicode URI and create a link.

This behaviour adds much inconvenience in posting links to forums, blogs, Web-based email clients etc.  The best way (it seem for me) is to provided several formats: plain text the same as text in URI bar, RTF and HTML with human-readable text and encoded link.

This behaviour creates difficulties for users from non-Latin countries: we have many links with so called “international” names, and the behaviour of this links is unpredictable.

Power users can use encoding mechanisms from free Web sites or client software.  Or you can add “Copy Link» menu item as it was on “mailto:” links.
(In reply to comment #37)
> http://www.w3.org/TR/html401/appendix/notes.html#non-ascii-chars defines the
> way to treat Unicode URIs:
> 
> We recommend that user agents adopt the following convention for handling
> non-ASCII characters in such cases:
> 
>    1. Represent each character in UTF-8 (see [RFC2279]) as one or more bytes.
>    2. Escape these bytes with the URI escaping mechanism (i.e., by converting
> each byte to %HH, where HH is the hexadecimal notation of the byte value).
> ...
> 
> So it is a trouble of UA to decode Unicode URI and create a link.

I'm not sure what you mean. The UA isn't in charge when it comes to creating hyperlinks. See for instance the two addresses in <http://blog.mozilla.com/gen/2008/05/23/firefox-3-utf-8-support-in-location-bar/#comment-43563>.
I mean, that this way Mozilla comes a third-party URI conversion tool for UAs uncapable of using Unicode URIs.

How location copying it is supposed to work is to copy exactly the text in Location Bar.  It is great you decode URI to a human-readable form, but the unability to copy-paste it breaks that great feature.
To be clear, this doesn't affect only non-ASCII, it affects URLs that include spaces and quotation marks, etc.

I'd be OK if this was fixed by only escaping if the selection includes the beginning of the URL.

Also, I very much doubt Rob is working on this bug.
Assignee: sayrer → nobody
Summary: Inconsistency when copying a non-ASCII URL/address (escaped/encoded when the whole URL is being copied, but not when only a part is copied) → URL is not escaped when a part is copied, but is escaped when the whole URL is copied
Is the behavior we currently use actually addressing something that's even still a problem? Hasn't pretty much everything gone UTF-8 by now? Certainly it's widespread enough that we can revisit this, because the existing behavior is really weird.
I got used to copying the link without http:// and it's OK for me, anyway the behavior is weird. I agree and confirm.
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: NEW → RESOLVED
Closed: 17 years ago6 years ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.