Closed Bug 244754 Opened 20 years ago Closed 20 years ago

URL is not shown in the status bar when I point at a link on a page encoded as 8-bit Unicode

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: berndt.soderstrom, Assigned: jshin1987)

References

Details

(Keywords: intl)

Attachments

(3 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7) Gecko/20040514
Build Identifier: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7) Gecko/20040514

When you point at a link in a page that is encoded as ISO-8859-1, the URL of the
file that the link refers to is shown on the status bar, as it should be.
However, when you point at a link in a page that is encoded as UTF-8, the URL of
the file that the link refers to will not appear in the status bar.

Reproducible: Always
Steps to Reproduce:
1. Just point at a link within a page that is encoded as UTF-8.
2. Move the mouse pointer to another link within the same page.
Actual Results:  
The URLs of the files that the links refer to didn't appear on the status bar.

Expected Results:  
The URLs of the files that the links refer to should have appeared on the status
bar.
Sample page showing this problem?  Testcase showing this problem?
Berndt, do not send email to me with details of the problem.  Please make
comments and attach files directly to the bug.  Thanks.
I've found out that the problem depends on whether the URL of the link contains
non-ASCII character. Within ISO-8859-1 documents, the URL of the link that you
point at will be shown in the status bar, regardless of what characters there
are in the URL.

Within UTF-8 documents, the URL of the link that you point at will be shown in
the status bar if it contains only non-ASCII characters.

I have attached two test files, one encoded as UTF-8 and another encoded as
ISO-8859-1. Download both files to the same directory; the name of the directory
must contain at least one non-ASCII character (eg. Ã or è) in order for you to
see the bug.
So it's the directory name that has to have a non-ascii character?  Putting a
non-ascii character in the href itself (in the document) doesn't show the bug?
(In reply to comment #6)
> So it's the directory name that has to have a non-ascii character?  Putting a
> non-ascii character in the href itself (in the document) doesn't show the bug?

Yes.
Can you reproduce this with a non-ascii path on an HTTP server?  Or only with a
local file?
I reproduced this both with local files and on an HTTP server. The actual
directory name got strangely corrupted when creating it over FTP, but that is a
separate issue.
http://smontagu.org/testcases/%88%91%88/test1.html - the UTF-8 file
http://smontagu.org/testcases/%88%91%88/test2.html - the ISO-8859-1 file
Status: UNCONFIRMED → NEW
Ever confirmed: true
Simon, thanks for the testcase!  I assume that directory name is in ISO-8859-1?

Darin, it sounds like a relative URI resolution issue (we fail to do it right,
so end up with either no URI or a bogus URI that can't be decoded into Unicode).

The URI objects in question are created with
nsContentUtils::NewURIWithDocumentCharset.  Could it be a problem if the base
URI has one charset set but the relative URI is getting a different charset?
Assignee: general → darin
Component: Browser-General → Networking
OS: Windows ME → All
QA Contact: general → benc
Hardware: PC → All
(In reply to comment #10)
> Simon, thanks for the testcase!  I assume that directory name is in ISO-8859-1?

If anything it's in cp862, but I have no idea why. Maybe the Windows FTP client
translates automatically from ISO-8859-8 to cp862? When we do display it in the
status bar, we seem to display it as ISO-8859-1.
It's not just an issue with relative URIs. If you change the encoding of this
page to UTF-8 and hover over the links in comment 9, nothing appears in the
status bar.
That's because the links in comment 9 get converted into URI objects based on
the page encoding (which means that we unescape and then treat the resulting
bytes as being in the page encoding).
*** Bug 257481 has been marked as a duplicate of this bug. ***
Note that comment 13 is wrong.  The real problem is described in bug 257481
comment 1...  The fix suggested there is pretty trivial; some feedback on the
suggestion would be much appreciated.
(In reply to bug 257481 comment #1)
> Should we just use the escaped URI in the status bar in cases when the
> conversion fails, perhaps?  That may have security implications, but so does
> showing nothing...

  I'd agree it's better to show the escaped URI than to show nothing or to show
some garbage (as MS IE does). There might be security implications, but in a
sense we'd 'fully disclose' the URI that way (instead of 'hiding' it)  although
we may 'obscure'(??) it   
 
> Note: the relevant code is nsWebShell::OnOverLink

Thanks for the pointer. If we take the suggested path, I guess it's better to
deal with it at call sites (if appropriate/necessary) than to tweak the API.
Keywords: intl
Hmm... you mean than change the UnEscapeURIForUI api?

I'd suggest checking its callers. Chances are they all want to do things that
way and we do indeed want to roll this change into the unescaping code...
related to Bug 229546?
Blocks: 229546
*** Bug 276516 has been marked as a duplicate of this bug. ***
(In reply to comment #17)
> Hmm... you mean than change the UnEscapeURIForUI api?
> 
> I'd suggest checking its callers. Chances are they all want to do things that
> way and we do indeed want to roll this change into the unescaping code...

Like this? Perhaps, we have to indicate that we fall back to escaped URI  via
the return value.

  // in case of failure, return escaped URI
  if (NS_FAILED(convertURItoUnicode(
                PromiseFlatCString(aCharset), unescapedSpec, PR_TRUE, _retval)))
    // use UTF-8 for IDN in auth part
    CopyUTF8toUTF16(aURIFragment, _retval); 
  return NS_OK;
Yes, something like that.  If you want to have a return code to indicate this,
that's ok, though not really necessary... in that case it should be a success
code, though.
I went through all the callers and some of them do their own error-processing.
Should I get rid of them?

http://lxr.mozilla.org/seamonkey/source/docshell/base/nsDocShell.cpp#2796

2796           rv = textToSubURI->UnEscapeURIForUI(charset, spec, formatStrs[0]);
2797         if (NS_FAILED(rv)) {
2798           CopyASCIItoUCS2(spec, formatStrs[0]);
2799           rv = NS_OK;
2800         }

http://lxr.mozilla.org/seamonkey/source/content/html/document/src/nsMediaDocument.cpp#324

324       if (NS_SUCCEEDED(rv))
325         rv = textToSubURI->UnEscapeURIForUI(docCharset, fileName, fileStr);
326     }
327     if (fileStr.IsEmpty())
328       CopyUTF8toUTF16(fileName, fileStr);
329   }

http://lxr.mozilla.org/seamonkey/source/dom/src/base/nsLocation.cpp#357
357         rv = textToSubURI->UnEscapeURIForUI(charset, ref, unicodeRef);
358       }
359       
360       if (NS_FAILED(rv)) {
361         // Oh, well.  No intl here!
362         NS_UnescapeURL(ref);
363         CopyASCIItoUTF16(ref, unicodeRef);
364         rv = NS_OK;
365       }
366     }
Yes. Doing the error-processing in a central place is exactly the point.
Attached patch patch (obsolete) — Splinter Review
Attachment #173138 - Flags: superreview?(bzbarsky)
Attachment #173138 - Flags: review?(darin)
Comment on attachment 173138 [details] [diff] [review]
patch

>Index: intl/uconv/idl/nsITextToSubURI.idl
>+   *  <li> In case of the conversion error, the URI fragment (escaped) is 

"a conversion error"

>+   *  <li> Always succeeeds (callers don't need to do the error checking)

"do error checking"

>Index: docshell/base/nsDocShell.cpp
>-          rv = textToSubURI->UnEscapeURIForUI(charset, spec, formatStrs[0]);
>-        if (NS_FAILED(rv)) {
>-          CopyASCIItoUCS2(spec, formatStrs[0]);
>-          rv = NS_OK;
>-        }
>+          // UnEscapeURIForUI always succeeds 
>+          textToSubURI->UnEscapeURIForUI(charset, spec, formatStrs[0]);

You still need to set rv = NS_OK.

What about the other callers?  I see at least a few still effectively doing
their own fallback.  In particular, nsMediaDocument and
nsExternalHelperAppService.cpp (callers of UnescapeFragment).
In case of nsMediaDocument, it's a bit different (That was in my first patch,
but was not included in the patch uploaded). Although very unlikely,
do_GetService may fail. The same is true of UnescapeFragment (there are other
potential causes so that callers still need to handle them.)

    if (!fileName.IsEmpty()) {
      nsresult rv;
      nsCOMPtr<nsITextToSubURI> textToSubURI = 
        do_GetService(NS_ITEXTTOSUBURI_CONTRACTID, &rv);
      if (NS_SUCCEEDED(rv))
        rv = textToSubURI->UnEscapeURIForUI(docCharset, fileName, fileStr);
    }
    if (fileStr.IsEmpty())
      CopyUTF8toUTF16(fileName, fileStr);
But then the same argument applies to docshell....
Attached patch updateSplinter Review
How about this?
Attachment #173138 - Attachment is obsolete: true
Attachment #173161 - Flags: superreview?(bzbarsky)
Attachment #173161 - Flags: review?(darin)
Attachment #173138 - Flags: superreview?(bzbarsky)
Attachment #173138 - Flags: review?(darin)
Comment on attachment 173161 [details] [diff] [review]
update

sr=bzbarsky
Attachment #173161 - Flags: superreview?(bzbarsky) → superreview+
Attachment #173161 - Flags: review?(darin) → review+
Is there a nightly with this bug fixed?
Flags: blocking1.8b2?
Flags: blocking-aviary1.1?
Not yet.  To jshin to get this landed.
Assignee: darin → jshin1987
oops. sorry I checked this in on Feb 22nd, but forgot to mark it as fixed. 
I've just verified that it's fixed in my trunk build. 
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Flags: blocking1.8b2?
Flags: blocking-aviary1.1?
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: