Closed Bug 166996 Opened 22 years ago Closed 21 years ago

[FIX]TriggerLink/GetHrefCString charset weirdness

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.5alpha

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

Details

Attachments

(1 file, 4 obsolete files)

Somewhat like this, I think.... 21 years ago Boris Zbarsky [:bzbarsky] 64.82 KB, patch	jst : superreview-	Details \| Diff \| Splinter Review
Fixed comments. 21 years ago Boris Zbarsky [:bzbarsky] 66.29 KB, patch	jst : superreview+	Details \| Diff \| Splinter Review
updated to David's and Mitch's comments 21 years ago Boris Zbarsky [:bzbarsky] 59.85 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review
updated to comments 19--21. 21 years ago Boris Zbarsky [:bzbarsky] 62.55 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review
The patch I just checked in 21 years ago Boris Zbarsky [:bzbarsky] 62.75 KB, patch		Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Description

•

22 years ago

GetHrefCString is an odd function.  It calls NS_MakeAbsoluteURIWithCharset, which

1)  Constructs a uri using the raw attr value and the document charset
2)  Calls GetSpec() on the uri and returns the spec

Then GetHrefCString returns the resulting spec out to the caller.  The problem
comes in when one clicks on a link.  In that case we take the returned spec and
create a URI out of it using the document charset again.  Unfortunately, the
output of GetSpec for some URI types (javascript: to be exact) is always UTF8! 
This leads to things like bug 161479, where the JS protocol handler is getting
passed escaped UTF8 data and "aCharset" which says "windows-1251".  It then
unescapes the UTF8, converts it using the windows-1251-to-unicode decoder, and
ends up with crap.

Some possible remedies:

A) The anchor-triggering code in nsGenericHTMLElement should not be using
   GetHrefCString, since TriggerLink will do the right charset conversion and
   base url resolution. It should just use the attr from the element.
B) GetHrefCString can be replaced with GetHrefURI (since it creates an nsIURI
   anyway).  TriggerLink can then take the nsIURI instead of a base URI and an
   href string.

I kinda like B myself....

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 1

•

22 years ago

GetHrefCString was originally written as part of performance optimization of
visited-link testing.  If you slow it down, btek will probably "notice".

Boris Zbarsky [:bzbarsky]

Assignee

Comment 2

•

22 years ago

Right.  I don't think either of the proposed changes would slow it down.  They
may in fact speed things up.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 3

•

22 years ago

Once this is fixed, revisit the code in bug 161479

Blocks: 161479

Boris Zbarsky [:bzbarsky]

Assignee

Comment 4

•

22 years ago

OK... well, I can't fix offhand because consumers rely on getting data that
cannot be represented as a URI out of GetHrefCString!  Of course this only works
for ASCII hrefs; non-ascii ones are screwed.  I've filed bug 176904 on the issue.

Depends on: 176904

Priority: -- → P5

Target Milestone: --- → Future

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Priority: P5 → P2

Target Milestone: Future → mozilla1.5beta

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Depends on: 209566

Boris Zbarsky [:bzbarsky]

Assignee

Comment 5

•

21 years ago

Attached patch Somewhat like this, I think.... (obsolete) — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Comment 6

•

21 years ago

Comment on attachment 125962 [details] [diff] [review]
Somewhat like this, I think....

The basic idea is to switch to nsIURI all over for this stuff... the rest is
domino effect, with the exception of the removal of an unneeded security check
or two in nsXMLElement (since TriggerLink does a security check).

Attachment #125962 - Flags: superreview?(jst)

Attachment #125962 - Flags: review?(dbaron)

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Summary: TriggerLink/GetHrefCString charset weirdness → [FIX]TriggerLink/GetHrefCString charset weirdness

Target Milestone: mozilla1.5beta → mozilla1.5alpha

Johnny Stenback (:jst)

Comment 7

•

21 years ago

Comment on attachment 125962 [details] [diff] [review]
Somewhat like this, I think....

In general, this looks good. I didn't have a real close look yet, but here's
some things I caught...

- In nsILink.h:

   /**
     * Get a pointer to the UTF-8 encoded canonical URL (i.e., fully
     * resolved to the base URL) that this link element points to.  The
     * buffer returned has been allocated for and should be deleted by
     * the caller.
     *
     * @param aBuf [out] A pointer to be filled in with a pointer to a
     *		   null-terminated string containing the canonical URL.
     *		   If the element has no HREF attribute, it is set to
     *		   nsnull.
     * @return NS_OK if the out pointer is filled in
     */
-  NS_IMETHOD GetHrefUTF8(char** aBuf) = 0;
+  NS_IMETHOD GetHrefURI(nsIURI** aURI) = 0;

The comment above this method needs to be updated.

- In nsGenericElement.h:

    * Load a link, putting together the proper URL from the pieces given.
    * @param aPresContext the pres context.
    * @param aVerb how the link will be loaded (replace page, new window, etc.)
    * @param aBaseURL the base URL to start with (content.baseURL, may be null)
    * @param aURLSpec the URL of the link (may be relative)
    * @param aTargetSpec the target (like target=, may be null)
    * @param aClick whether this was a click or not (if false, it assumes you
    *	     just hovered over the link)
    */
   nsresult TriggerLink(nsIPresContext* aPresContext,
			nsLinkVerb aVerb,
			nsIURI* aBaseURL,
-			const nsAString& aURLSpec,
+			nsIURI* aLinkURI,
			const nsAFlatString& aTargetSpec,
			PRBool aClick);

If you're passing in an nsIURI, you don't need a base URI. From looking at the
impl of this method, the base URI is used to do a security check, and that's
fine, but it shouldn't be called aBaseURI.

Other than that this looks good, but I'll need to have one more look once
that's fixed before I sr...

Attachment #125962 - Flags: superreview?(jst) → superreview-

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #125962 - Flags: review?(dbaron)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 8

•

21 years ago

Attached patch Fixed comments. (obsolete) — Details — Splinter Review

Attachment #125962 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #125993 - Flags: superreview?(jst)

Attachment #125993 - Flags: review?(dbaron)

Johnny Stenback (:jst)

Comment 9

•

21 years ago

Comment on attachment 125993 [details] [diff] [review]
Fixed comments.

sr=jst

Attachment #125993 - Flags: superreview?(jst) → superreview+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 10

•

21 years ago

Does it matter that the extra security check was
nsIScriptSecurityManager::DISALLOW_FROM_MAIL but the new one is
nsIScriptSecurityManager::STANDARD ?

Boris Zbarsky [:bzbarsky]

Assignee

Comment 11

•

21 years ago

I don't believe it does, since this was only called from MaybeTriggerAutoLink,
and that has a separate check that disables auto links entirely in mail (see
http://lxr.mozilla.org/seamonkey/source/content/xml/content/src/nsXMLElement.cpp#388).

That said, perhaps I should restore the DISALLOW_FROM_MAIL security check and
remove the blanket-disabling code?  Not sure which is faster/better...

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Blocks: 209573

Mitchell Stoltz (not reading bugmail)

Comment 12

•

21 years ago

Securitywise, they're probably equivalent. DISALLOW_FROM_MAIL is pretty quick,
just looks for source schemes of imap:, mailbox:, or news: and fails on those. I
recommend using DISALLOW_FROM_MAIL.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 13

•

21 years ago

Attached patch updated to David's and Mitch's comments (obsolete) — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #125993 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #125993 - Attachment is obsolete: false

Attachment #125993 - Flags: review?(dbaron)

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

21 years ago

Attachment #126794 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 14

•

21 years ago

Comment on attachment 126794 [details] [diff] [review]
updated to David's and Mitch's comments

>-  NS_PRECONDITION(aBaseURI != nsnull, "no base URI");

perhaps you should leave this?

>+nsWebShell::GetLinkState(nsIURI* aLinkURI, nsLinkState& aState)
> {
>   aState = eLinkState_Unvisited;
> 
>   // no history, leave state unchanged
>   if (!mGlobalHistory)
>     return NS_OK;

>+  if (!aLinkURI) {
>+    // No uri means not a link
>+    aState = eLinkState_NotLink;
>+    return NS_OK;
>+  }

Perhaps the !aLinkURI check should be before the initialization of |aState| and
the null check of |mGlobalHistory|?  The |NotLink| state (which this never
returned before -- maybe it should just be a precondition instead?) shouldn't
depend on whether global history is present.


>@@ -169,28 +169,26 @@ nsXMLElement::GetXMLBaseURI(nsIURI **aUR
>     rv = content->GetAttr(kNameSpaceID_XML,nsHTMLAtoms::base,value);
>     PRInt32 value_len;
>     if (rv == NS_CONTENT_ATTR_HAS_VALUE) {
>       PRInt32 colon = value.FindChar(':');
>       PRInt32 slash = value.FindChar('/');
>       if (colon > 0 && !( slash >= 0 && slash < colon)) {
>         // Yay, we have absolute path!
>         // The complex looking if above is to make sure that we do not erroneously
>         // think a value of "./this:that" would have a scheme of "./that"
> 
>-        NS_ConvertUCS2toUTF8 str(value);
>-
>-        rv = MakeURI(str, nsnull, aURI);
>+        rv = MakeURI(value, nsnull, mDocument, aURI);
>         if (NS_FAILED(rv))
>           break;
> 
>         if (!base.IsEmpty()) { // XXXdarin base is always empty
>-          CopyUTF16toUTF8(base, str);
>+          NS_ConvertUTF16toUTF8 str(base);

There's a loop here.  Are you sure this change is right?  And is the XXXdarin
comment correct?

(Not done reading patch.)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 15

•

21 years ago

> There's a loop here.  Are you sure this change is right?  And is the XXXdarin
> comment correct?

Oh, never mind -- I was getting source and dest mixed up.  (But what about the
XXXdarin comment?)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 16

•

21 years ago

Comment on attachment 126794 [details] [diff] [review]
updated to David's and Mitch's comments

>         // convert here, rather than twice in NS_MakeAbsoluteURI and
>         // back again

This isn't relevant anymore.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 17

•

21 years ago

> perhaps you should leave this?

Sure.  Done (but I still took out the early return there).

> Perhaps the !aLinkURI check should be before the initialization of |aState| and
> the null check of |mGlobalHistory|?

Done.

As for GetXMLBaseURI(), the patch in bug 209573 removes it entirely, so I just
made the minimal changes I could that would leave it compiling... (it has a
bunch of issues other than the one Darin's comment points to in any case).

Boris Zbarsky [:bzbarsky]

Assignee

Comment 18

•

21 years ago

> This isn't relevant anymore.

Indeed.  Removed (artifact of de-merging the patch...)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 19

•

21 years ago

Comment on attachment 126794 [details] [diff] [review]
updated to David's and Mitch's comments

r=dbaron if you address my comments above, and also rename
NS_MakeAbsoluteURIWithCharset to NS_NewURIWithCharset (or WithDocumentCharset
WithDocument?), since it's now analogous to NS_NewURI rather than
NS_MakeAbsoluteURI.

Attachment #126794 - Flags: review?(dbaron) → review+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 20

•

21 years ago

Also, NS_MakeAbsoluteURIWithCharset and MakeURI seem to be doing pretty much the
same thing.  Do we need the latter?

Darin Fisher

Comment 21

•

21 years ago

>   // URI can't be encoded in UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32-LE,
>   // UTF-32LE, UTF-32BE (yet?). Truncate it and leave it to default (UTF-8)
>   if (originCharset[0] == 'U' &&
>       originCharset[1] == 'T' &&
>       originCharset[2] == 'F')
>     originCharset.Truncate();

I think this code should be moved into nsStandardURL::Init so all callers of
NewURI will get this.  I can't imagine any server wanting URL-escaped UTF-32,
so we should just do this mapping for all URLs.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 22

•

21 years ago

Attached patch updated to comments 19--21. (obsolete) — Details — Splinter Review

Attachment #125993 - Attachment is obsolete: true

Attachment #126794 - Attachment is obsolete: true

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 23

•

21 years ago

Comment on attachment 126797 [details] [diff] [review]
updated to comments 19--21.

>+    // URI can't be encoded in UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32-LE,
>+    // UTF-32LE, UTF-32BE (yet?). Truncate it and leave it to default (UTF-8)
>+    if (mOriginCharset.Length() >= 3 &&
>+        mOriginCharset[0] == 'U' &&
>+        mOriginCharset[1] == 'T' &&
>+        mOriginCharset[2] == 'F')
>         mOriginCharset.Truncate();

This should be

>+        (mOriginCharset[0] == 'U' || mOriginCharset[0] == 'u') &&
>+        (mOriginCharset[1] == 'T' || mOriginCharset[1] == 't') &&
>+        (mOriginCharset[2] == 'F' || mOriginCharset[2] == 'f'))

since:
 * the URL code doesn't require the canonicalization of the charset alias code
 * we don't want to introduce a new canonicalization by uppercasing /
lowercasing the input

Also, the comment should reflect what the comment there originally said, which
is that we're storing UTF-8 as the empty string since they're equivalent so we
don't need to keep it around.

Attachment #126797 - Flags: review+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 24

•

21 years ago

Attached patch The patch I just checked in — Details — Splinter Review

This fixes the case issue in nsStandardURL

Attachment #126797 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Assignee

Comment 25

•

21 years ago

Checked in, Tp looks ok, marking fixed.  I looked over the code in bug 161479
again, and we may still need it for some bizarre edge cases.... :(

Status: NEW → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

5 years ago

Component: DOM → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.