Closed Bug 1421214 Opened 6 years ago Closed 6 years ago

Links don't work when anchor fragment identifier contains §

Categories

(Core :: DOM: Navigation, defect, P2)

57 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 --- unaffected
firefox57 --- wontfix
firefox58 - fixed
firefox59 - fixed

People

(Reporter: wolfgang.lenssen, Assigned: alchen)

References

Details

(Keywords: regression, reproducible)

Attachments

(2 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; rv:57.0) Gecko/20100101 Firefox/57.0
Build ID: 20171112125346

Steps to reproduce:

After update on Firefox 57.0 Quantum links wich targets include the §-sign do not work.
For example:
http://www.vkm-baden.de/infothek/ar-m.htm


Actual results:

Firefox does not react by clicks on the hyperlinks
For example:
http://www.vkm-baden.de/infothek/ar-m.htm#§1


Expected results:

Firefox should show the place with the target
For example:
<a name="§1"></a>
The same is by:
<a id="§1"></a>
So I think the problem is the §-sign
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
20171127220446

No duplicates that I can find, and it works in Opera, so confirming.
Status: UNCONFIRMED → NEW
Has STR: --- → yes
Component: Untriaged → Document Navigation
Ever confirmed: true
Product: Firefox → Core
Summary: Firefox does not work by links wich target includes the §-sign → Links don't work when anchor fragment identifier contains §
Attached file utf-8_test.html
It works fine on a UTF-8 page. Link in comment 0 is ISO-8859-1 (§ is part of that character set, so it should work).
Pushlog
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=1b4c59eef820b46eb0037aca68f83a15088db45f&tochange=ab2d700fda2b4934d24227216972dce9fac19b74

Could bug 1391978 have caused this? If not, do you know what did?
Has Regression Range: --- → yes
Flags: needinfo?(m_kato)
it was the charset-definition!
With charset=utf-8 it works fine
Thank You!
(In reply to Gingerbread Man from comment #3)
> Pushlog
> https://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=1b4c59eef820b46eb0037aca68f83a15088db45f&tochange=ab2d
> 700fda2b4934d24227216972dce9fac19b74
> 
> Could bug 1391978 have caused this? If not, do you know what did?

Cannot.  This is another bug.  You should use mozgression to find the regression.
Flags: needinfo?(m_kato)
Hi Alphan, 
You've been working on utf-8 and anchor related bugs recently. Mind taking a look at this?
Flags: needinfo?(alchen)
Priority: -- → P2
Assignee: nobody → alchen
Flags: needinfo?(alchen)
The symptom related to bug 1380323.
It should be a regression of bug 1380323.
When revised the behavior of scrolling to fragments, I don't try GoToAnchor() with utf8 encoding before document's charset.
Need to add this part back to keep the functionality.
Attachment #8933247 - Flags: review?(bugs)
Trying to understand what was changed in the regressing bug....
Comment on attachment 8933247 [details] [diff] [review]
Try GoToAnchor() with utf-8 encoding before document's charset. r?smaug

># HG changeset patch
># User Alphan Chen <alchen@mozilla.com>
># Date 1512036985 -28800
>#      Thu Nov 30 18:16:25 2017 +0800
># Node ID 671b06a33fa0563244f810c28e3ff763dd12137a
># Parent  40df5dd35fdb7ce3652fe4448ac8961c075c928e
>Bug 1421214 - Try GoToAnchor() with utf-8 encoding before document's charset.
>
>diff --git a/docshell/base/nsDocShell.cpp b/docshell/base/nsDocShell.cpp
>--- a/docshell/base/nsDocShell.cpp
>+++ b/docshell/base/nsDocShell.cpp
>@@ -11777,20 +11777,28 @@ nsDocShell::ScrollToAnchor(bool aCurHasR
>     // http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.1
> 
>     // We try the UTF-8 string first, and then try the document's
>     // charset (see below).  If the string is not UTF-8,
>     // conversion will fail and give us an empty Unicode string.
>     // In that case, we should just fall through to using the
>     // page's charset.
>     nsresult rv = NS_ERROR_FAILURE;
>-    NS_ConvertUTF8toUTF16 uStr(aNewHash);
>-    if (!uStr.IsEmpty()) {
>-      rv = shell->GoToAnchor(uStr, scroll,
>+    rv = shell->GoToAnchor(NS_ConvertUTF8toUTF16(aNewHash), scroll,
>+                           nsIPresShell::SCROLL_SMOOTH_AUTO);
I tried and failed to understand why we want to remove if (!uStr.IsEmpty()) { check.

>+
>+    if (NS_FAILED(rv)) {
>+      char* str = ToNewCString(aNewHash);
>+      if (!str) {
>+        return NS_ERROR_OUT_OF_MEMORY;
>+      }
>+      nsUnescape(str);
>+      rv = shell->GoToAnchor(NS_ConvertUTF8toUTF16(str), scroll,
>                              nsIPresShell::SCROLL_SMOOTH_AUTO);
>+      free(str);
ok, so you're binging back the unescaped option, not utf8.


>-    // If UTF-8 URI failed then try to assume the string as a
>-    // document's charset.
>     if (NS_FAILED(rv)) {
>-      auto encoding = GetDocumentCharacterSet();
>       char* tmpstr = ToNewCString(mScrollToRef);
>       if (!tmpstr) {
>         return;
>       }
>       nsUnescape(tmpstr);
>       nsAutoCString unescapedRef;
>       unescapedRef.Assign(tmpstr);
>       free(tmpstr);
> 
>-      rv = encoding->DecodeWithoutBOMHandling(unescapedRef, ref);
>-
>-      if (NS_SUCCEEDED(rv) && !ref.IsEmpty()) {
>-        rv = shell->GoToAnchor(ref, mChangeScrollPosWhenScrollingToRef);
>+      rv = shell->GoToAnchor(NS_ConvertUTF8toUTF16(unescapedRef), mChangeScrollPosWhenScrollingToRef);
Also here. Why we want to remove ref.IsEmpty() check?


>+      // If UTF-8 URI failed then try to assume the string as a
>+      // document's charset.
>+      if (NS_FAILED(rv)) {
>+        auto encoding = GetDocumentCharacterSet();
While you're here, could you switch auto to the actual type. With auto I had to check GetDocumentCharacterSet return value
to understand what thing it gives. It is not obvious from the name of the method what it returns.

Can we get some tests?

(and sorry, this isn't too good review, but I need to still understand all the changes in bug 1380323 too and the spec. So I basically need to read this code couple of times.)
Attachment #8933247 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] (ni?s and f?s processed after r?s) from comment #10)
> Comment on attachment 8933247 [details] [diff] [review]
> Try GoToAnchor() with utf-8 encoding before document's charset. r?smaug
> > 
> >     // We try the UTF-8 string first, and then try the document's
> >     // charset (see below).  If the string is not UTF-8,
> >     // conversion will fail and give us an empty Unicode string.
> >     // In that case, we should just fall through to using the
> >     // page's charset.
> >     nsresult rv = NS_ERROR_FAILURE;
> >-    NS_ConvertUTF8toUTF16 uStr(aNewHash);
> >-    if (!uStr.IsEmpty()) {
> >-      rv = shell->GoToAnchor(uStr, scroll,
> >+    rv = shell->GoToAnchor(NS_ConvertUTF8toUTF16(aNewHash), scroll,
> >+                           nsIPresShell::SCROLL_SMOOTH_AUTO);
> I tried and failed to understand why we want to remove if (!uStr.IsEmpty())
> { check.


> >-    // If UTF-8 URI failed then try to assume the string as a
> >-    // document's charset.
> >     if (NS_FAILED(rv)) {
> >-      auto encoding = GetDocumentCharacterSet();
> >       char* tmpstr = ToNewCString(mScrollToRef);
> >       if (!tmpstr) {
> >         return;
> >       }
> >       nsUnescape(tmpstr);
> >       nsAutoCString unescapedRef;
> >       unescapedRef.Assign(tmpstr);
> >       free(tmpstr);
> > 
> >-      rv = encoding->DecodeWithoutBOMHandling(unescapedRef, ref);
> >-
> >-      if (NS_SUCCEEDED(rv) && !ref.IsEmpty()) {
> >-        rv = shell->GoToAnchor(ref, mChangeScrollPosWhenScrollingToRef);
> >+      rv = shell->GoToAnchor(NS_ConvertUTF8toUTF16(unescapedRef), mChangeScrollPosWhenScrollingToRef);
> Also here. Why we want to remove ref.IsEmpty() check?
> 
At that time, I just think GoToAnchor() has the empty check.
https://searchfox.org/mozilla-central/source/layout/base/PresShell.cpp#3089
However, we won't do another trial if GoToAnchor meet empty AnchorName.

If we do the check here, we could try other cases.
I will add the check back in next patch.

> 
> >+      // If UTF-8 URI failed then try to assume the string as a
> >+      // document's charset.
> >+      if (NS_FAILED(rv)) {
> >+        auto encoding = GetDocumentCharacterSet();
> While you're here, could you switch auto to the actual type. With auto I had
> to check GetDocumentCharacterSet return value
> to understand what thing it gives. It is not obvious from the name of the
> method what it returns.
> 
> Can we get some tests?
> 
Will do that.
Update the patch.
Attachment #8933247 - Attachment is obsolete: true
Attachment #8939770 - Flags: review?(bugs)
Comment on attachment 8939770 [details] [diff] [review]
Try GoToAnchor() with unescaped string before using document's charset.

># HG changeset patch
># User Alphan Chen <alchen@mozilla.com>
># Date 1515060759 -28800
>#      Thu Jan 04 18:12:39 2018 +0800
># Node ID a2ce18c2ad809f4eb304bcb10611a7692b71aadc
># Parent  5afa32a16a2f7ebbd8bf19e49efc50832110980d
>Bug 1421214 - Try GoToAnchor() with unescaped string before using document's charset. r?smaug
>
>diff --git a/docshell/base/nsDocShell.cpp b/docshell/base/nsDocShell.cpp
>--- a/docshell/base/nsDocShell.cpp
>+++ b/docshell/base/nsDocShell.cpp
>@@ -11957,16 +11957,29 @@ nsDocShell::ScrollToAnchor(bool aCurHasR
>     // page's charset.
>     nsresult rv = NS_ERROR_FAILURE;
>     NS_ConvertUTF8toUTF16 uStr(aNewHash);
>     if (!uStr.IsEmpty()) {
>       rv = shell->GoToAnchor(uStr, scroll,
>                              nsIPresShell::SCROLL_SMOOTH_AUTO);
>     }
> 
>+    if (NS_FAILED(rv)) {
>+      char* str = ToNewCString(aNewHash);
>+      if (!str) {
>+        return NS_ERROR_OUT_OF_MEMORY;
>+      }
>+      nsUnescape(str);
>+      if (!NS_ConvertUTF8toUTF16(str).IsEmpty()) {
>+        rv = shell->GoToAnchor(NS_ConvertUTF8toUTF16(str), scroll,
You don't want to call NS_ConvertUTF8toUTF16 twice. Conversion can be a bit slow and creating NS_ConvertUTF8toUTF16 may allocate memory (which is also slow).
If you look at the implementation of NS_ConvertUTF8toUTF16, it is actually class inheriting ns(Auto)String.
So, you could do
NS_ConvertUTF8toUTF16 utf16Str(str);
if (!str.IsEmpty()) {
  rv = shell->GoToAnchor(utf16Str, scroll, ...


>-      rv = encoding->DecodeWithoutBOMHandling(unescapedRef, ref);
>-
>-      if (NS_SUCCEEDED(rv) && !ref.IsEmpty()) {
>-        rv = shell->GoToAnchor(ref, mChangeScrollPosWhenScrollingToRef);
>+      if (!NS_ConvertUTF8toUTF16(unescapedRef).IsEmpty()) {
>+        rv = shell->GoToAnchor(NS_ConvertUTF8toUTF16(unescapedRef),
>+                               mChangeScrollPosWhenScrollingToRef);
>+      }
Same here.


those fixed, r+
Attachment #8939770 - Flags: review?(bugs) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/65df1e9c2132
Try GoToAnchor() with unescaped string before using document's charset. r=smaug
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/65df1e9c2132
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Please request uplift if you feel comfortable, bearing in mind the last beta58 build before release candidates is tomorrow.
Flags: needinfo?(alchen)
Comment on attachment 8940906 [details] [diff] [review]
Try GoToAnchor() with unescaped string before using document's charset. r=smaug

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1380323
[User impact if declined]: If the website didn't assign the proper character set, Firefox might not go to the right anchor in some cases(e.g.:'§').
[Is this code covered by automated tests?]: A wpt test
[Has the fix been verified in Nightly?]: Verified locally, landed on m-c 2 days ago.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: No
[Is the change risky?]: Minimal risk.
[Why is the change risky/not risky?]: The code changes are relatively small and self contained.  We have a good existing set of tests for this code to avoid regressions.
[String changes made/needed]: None
Flags: needinfo?(alchen)
Attachment #8940906 - Flags: approval-mozilla-beta?
Comment on attachment 8940906 [details] [diff] [review]
Try GoToAnchor() with unescaped string before using document's charset. r=smaug

Fix a regression that Firefox might not go to the right anchor if the website didn't specify a charset. Beta58+.
Attachment #8940906 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.