Closed Bug 918331 Opened 11 years ago Closed 5 years ago

Firefox sends illegal path instead of encoding the pipe character in URL address

Categories

(Firefox :: Address Bar, defect)

24 Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1163959

People

(Reporter: randy.hudson, Unassigned)

References

Details

(Whiteboard: [bugday-20130923][necko-backlog])

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/536.30.1 (KHTML, like Gecko) Version/6.0.5 Safari/536.30.1

Steps to reproduce:

Open a URL like "http://localhost/foo%7Cbar"


Actual results:

The first time, firefox sends the path "foo%7Cbar" in the HTTP request.  But if you refresh the page, the second time firefox sends the path "foo|bar".  Many applications attempt to construct a URI from this path, but '|' is an illegal character.

There are two issues:
1) Why does firefox send a different path the second time?  It should do the same thing both times.
2) The pipe character is illegal in a URI segment. Other browsers will properly encode this character, even if the user types it as '|' instead of %7C.  see http://tools.ietf.org/html/rfc3986#section-3.3
Thanks for reporting this. 
When I first open the URL indicated, I see http://localhost/foo|bar directly.
Component: Untriaged → Networking
Product: Firefox → Core
Whiteboard: [bugday-20130923]
See Also: → 941043
Will it be possible for this to get fixed in the near future or, at least, in Firefox 36?

Or, if it's not a priority, may I submit a proposed patch to fix this in Firefox 36, as long as it's before October 13th? I don't have a proposed patch prepared at this time, but that is something that I may be able to produce before then.


Thanks
wdyt?
Flags: needinfo?(valentin.gosu)
Whiteboard: [bugday-20130923] → [bugday-20130923][necko-backlog]
This is a duplicate of bug 1026938.
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(valentin.gosu)
Resolution: --- → DUPLICATE
Blocks: 1064700
Blocks: url
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: DUPLICATE → ---
> There are two issues:
> 1) Why does firefox send a different path the second time?  It should do the
> same thing both times.
I can't reproduce this, same as comment 1.
> 2) The pipe character is illegal in a URI segment. Other browsers will
> properly encode this character, even if the user types it as '|' instead of
> %7C.  see http://tools.ietf.org/html/rfc3986#section-3.3
I don't see we should prohibit pipe character for path [1]
Other browsers also allow it.

I'd like to close this. What do you think, :annevk?

[1] https://url.spec.whatwg.org/#path-state
Flags: needinfo?(annevk)
The problem is that the address bar decodes percent-escaped characters. Now it's fine for the address bar to do post-processing on a URL for display purposes (e.g., use UTF-8 decode to show some Unicode rather than something that looks like garbage), but it shouldn't really affect what goes over the wire.

(Another way of solving this problem would be to just show the domain in the address bar, similar to Safari. And only show the full URL, without any kind of processing, when editing.)
Flags: needinfo?(annevk)
(In reply to Anne (:annevk) from comment #6)
> The problem is that the address bar decodes percent-escaped characters. Now
> it's fine for the address bar to do post-processing on a URL for display
> purposes (e.g., use UTF-8 decode to show some Unicode rather than something
> that looks like garbage), but it shouldn't really affect what goes over the
> wire.
>
> (Another way of solving this problem would be to just show the domain in the
> address bar, similar to Safari. And only show the full URL, without any kind
> of processing, when editing.)

The URL presentation issue has been discussed in bug 1124600.
Packets on the wire work fine, not a problem.

What we left is whether pipe character '|' is valid for a url's path.
IMO we have no points to prohibit this character.
Fair, resolving as INVALID then.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → INVALID
Assignee: nobody → juhsu
Whiteboard: [bugday-20130923][necko-backlog] → [bugday-20130923][necko-active]
> The URL presentation issue has been discussed in bug 1124600.
> Packets on the wire work fine, not a problem.

The path that gets sent "on the wire" contains '|' UNESCAPED.  This is not a presentation problem.  %20 is displayed as space, but SPACE is never sent to the server in the path.

> I don't see we should prohibit pipe character for path [1]
It must be encoded, just like SPACE.

> Other browsers also allow it.

Really!? Which browsers?  All other browsers I've checked escape '|' as %7C over the wire.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
(In reply to randy from comment #9)
> > The URL presentation issue has been discussed in bug 1124600.
> > Packets on the wire work fine, not a problem.
> 
> The path that gets sent "on the wire" contains '|' UNESCAPED.  This is not a
> presentation problem.  %20 is displayed as space, but SPACE is never sent to
> the server in the path.
> 
> > I don't see we should prohibit pipe character for path [1]
> It must be encoded, just like SPACE.
> 
> > Other browsers also allow it.
> 
> Really!? Which browsers?  All other browsers I've checked escape '|' as %7C
> over the wire.
Okay I see the problem.
I misunderstand that '|' in an url's path should be malformed.
I tested in local server. It happens to work. I guess something out of browser handles the coding.

But yes, we need the escape '|' to %7C.
Status: REOPENED → ASSIGNED
As far as I can tell Firefox and Safari do not escape that code point for URLs, but Chrome and Edge do. So it's a similar question to bug 1197123. What do we want to align with. I'm not sure that making decisions on a per code point basis will do us much good.
(In reply to Anne (:annevk) from comment #11)
> As far as I can tell Firefox and Safari do not escape that code point for
> URLs, but Chrome and Edge do. So it's a similar question to bug 1197123.
> What do we want to align with. I'm not sure that making decisions on a per
> code point basis will do us much good.

We already have a meta bug 1064700 for percent encoding.
Let move discussion to there.
Whiteboard: [bugday-20130923][necko-active] → [bugday-20130923][necko-next]
Whiteboard: [bugday-20130923][necko-next] → [bugday-20130923][necko-backlog]
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Assignee: juhsu → nobody
Status: ASSIGNED → NEW

hi, any progress here? I encountered this bug using 60.5.1esr (64-bit) on opensuse leap 15.

this bug is open since 6years? really??? the first request to a page containing | in the URL is encoded correctly to %C& (at least on wire). all subsequent requests do not.

so the code within Firefox is there .. why does Fx do something different for the second request? (ok, that is why it's a bug). I'm just wondering ...

please, consider changing Fx to obey RFC 7230 and RFC 3986. please!

(In reply to randy from comment #9)

Really!? Which browsers? All other browsers I've checked escape '|' as %7C
over the wire.

As far as I can tell Firefox and Safari do the same thing here:

https://jsdom.github.io/whatwg-url/#url=aHR0cHM6Ly9leGFtcGxlLmNvbS98&base=YWJvdXQ6Ymxhbms=

This is also in line with https://url.spec.whatwg.org/.

So resolving as INVALID again. Filed https://bugs.chromium.org/p/chromium/issues/detail?id=943026 on Chrome.

Status: NEW → RESOLVED
Closed: 8 years ago5 years ago
Resolution: --- → INVALID

Sorry, on reflection I guess there's still the original bug, which is about decoding happening in the address bar, but that's not in the networking code.

No longer blocks: url, 1064700
Component: Networking → Address Bar
Product: Core → Firefox
Status: RESOLVED → UNCONFIRMED
Ever confirmed: false
Resolution: INVALID → ---
Priority: P3 → --

sudo tcpdump -nn -i eth0 -A -X tcp port 80 or tcp port 81 and host threatcon5.xn--bckbone-5wa.eu
..

wget -O/dev/null 'http://threatcon5.bäckbone.eu/test|page.html'

    0x0040:  4dfa c9b5 553a 6e2e 4745 5420 2f74 6573  M...U:n.GET./tes
    0x0050:  7425 3743 7061 6765 2e68 746d 6c20 4854  t%7Cpage.html.HT    <-----
    0x0060:  5450 2f31 2e31 0d0a 5573 6572 2d41 6765  TP/1.1..User-Age
    0x0070:  6e74 3a20 5767 6574 2f31 2e31 392e 3520  nt:.Wget/1.19.5.
    0x0080:  286c 696e 7578 2d67 6e75 290d 0a41 6363  (linux-gnu)..Acc
    0x0090:  6570 743a 202a 2f2a 0d0a 4163 6365 7074  ept:.*/*..Accept
    0x00a0:  2d45 6e63 6f64 696e 673a 2069 6465 6e74  -Encoding:.ident
    0x00b0:  6974 790d 0a48 6f73 743a 2074 6872 6561  ity..Host:.threa
    0x00c0:  7463 6f6e 352e 786e 2d2d 6263 6b62 6f6e  tcon5.xn--bckbon
    0x00d0:  652d 3577 612e 6575 0d0a 436f 6e6e 6563  e-5wa.eu..Connec
    0x00e0:  7469 6f6e 3a20 4b65 6570 2d41 6c69 7665  tion:.Keep-Alive

curl -o/dev/null 'http://threatcon5.bäckbone.eu/test|page.html'

    0x0030:  125b e436 8018 00e0 e41b 0000 0101 080a  .[.6............
    0x0040:  4dfc 7787 553c 1bfd 4745 5420 2f74 6573  M.w.U<..GET./tes
    0x0050:  747c 7061 6765 2e68 746d 6c20 4854 5450  t|page.html.HTTP    <-----
    0x0060:  2f31 2e31 0d0a 486f 7374 3a20 7468 7265  /1.1..Host:.thre
    0x0070:  6174 636f 6e35 2e78 6e2d 2d62 636b 626f  atcon5.xn--bckbo
    0x0080:  6e65 2d35 7761 2e65 750d 0a55 7365 722d  ne-5wa.eu..User-
    0x0090:  4167 656e 743a 2063 7572 6c2f 372e 3630  Agent:.curl/7.60
    0x00a0:  2e30 0d0a 4163 6365 7074 3a20 2a2f 2a0d  .0..Accept:.*/*.

Firefox 60.5.1esr opensuse leap 15 (64bit)

    0x0040:  4dfc f202 553c 9678 4745 5420 2f74 6573  M...U<.xGET./tes
    0x0050:  747c 7061 6765 2e68 746d 6c20 4854 5450  t|page.html.HTTP    <-----
    0x0060:  2f31 2e31 0d0a 486f 7374 3a20 7468 7265  /1.1..Host:.thre
    0x0070:  6174 636f 6e35 2e78 6e2d 2d62 636b 626f  atcon5.xn--bckbo
    0x0080:  6e65 2d35 7761 2e65 750d 0a55 7365 722d  ne-5wa.eu..User-
    0x0090:  4167 656e 743a 204d 6f7a 696c 6c61 2f35  Agent:.Mozilla/5
    0x00a0:  2e30 2028 5831 313b 204c 696e 7578 2078  .0.(X11;.Linux.x
    0x00b0:  3836 5f36 343b 2072 763a 3630 2e30 2920  86_64;.rv:60.0).
    0x00c0:  4765 636b 6f2f 3230 3130 3031 3031 2046  Gecko/20100101.F
    0x00d0:  6972 6566 6f78 2f36 302e 300d 0a41 6363  irefox/60.0..Acc
    0x00e0:  6570 743a 2074 6578 742f 6874 6d6c 2c61  ept:.text/html,a

wget -V | head -1
GNU Wget 1.19.5 built on linux-gnu.

curl -V | head -1
curl 7.60.0 (x86_64-suse-linux-gnu) libcurl/7.60.0 OpenSSL/1.1.0i zlib/1.2.11 libidn2/2.0.4 libpsl/0.20.1 (+libidn2/2.0.4) libssh/0.7.5/openssl/zlib nghttp2/1.31.1

date -u
Mon 18 Mar 21:11:04 UTC 2019

I assume the problem happens when you visit http://localhost/foo%7Cbar and then you confirm again with Enter what is already in the address bar. For the first visit, even if we show | we send %7C, as well as if you hit refresh after that.
The problem is that the address bar just takes the current value in the input field when you confirm its contents with enter, and it's currently stateless, it doesn't know that what you pasted before was encoded. We could maybe add some heuristic to compare what is in the input field with the currentURI, though it would break the opposite use-case, what if I type http://localhost/foo%7Cbar and then I paste http://localhost/foo|bar and press Enter?
What if the user actually edits the url adding more params, how'd we handle that edit when we have 2 versions of the url?

If on focus we revert to the currentURI, then url editing may be less handy for the user, because it's less readable.
We may stop decoding %7C, but then how many other chars should we do that with?

anyway, in the end the remaining part here is pretty much bug 1163959, indeed some of this discussion is mirrored there, and it's not a trivial choice, anyway. I'll keep that bug because it looks more actionable and has more technical discussion.

Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → DUPLICATE

hi Marco, thanks for making progress here.

(I try to avoid assuming)

we use an Atlassian product called Confluence which stores wiki pages. some of them contain a | in their subject so it will be part of the web address.
If I open a fresh Firefox and type in parts of that wiki pages it shows me the pages I had open in the past.
If I choose the one page with | in the url/subject I get this:

HTTP Status 400 – Bad Request

Type Exception Report

Message Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986

Description The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).

Exception

java.lang.IllegalArgumentException: Invalid character found in the request target. The valid characters are defined in RFC 7230 and RFC 3986
org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:474)
org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:294)
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:770)
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1415)
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
java.lang.Thread.run(Thread.java:748)

Note The full stack trace of the root cause is available in the server logs.
Apache Tomcat/9.0.12

You see? I don't have to change the URL to reproduce it. I just access wegpages I had open before. IMHO Firefox should not change %7C back to | only to show it to me in the addressbar. | and many other characters are not allowed within url. (!!!)

HTH
Regards
Stefan K.

You need to log in before you can comment on or make changes to this bug.