[FIX]TriggerLink/GetHrefCString charset weirdness

RESOLVED FIXED in mozilla1.5alpha

Status

()

Core
DOM
P2
normal
RESOLVED FIXED
16 years ago
15 years ago

People

(Reporter: bz, Assigned: bz)

Tracking

Trunk
mozilla1.5alpha
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 4 obsolete attachments)

GetHrefCString is an odd function.  It calls NS_MakeAbsoluteURIWithCharset, which

1)  Constructs a uri using the raw attr value and the document charset
2)  Calls GetSpec() on the uri and returns the spec

Then GetHrefCString returns the resulting spec out to the caller.  The problem
comes in when one clicks on a link.  In that case we take the returned spec and
create a URI out of it using the document charset again.  Unfortunately, the
output of GetSpec for some URI types (javascript: to be exact) is always UTF8! 
This leads to things like bug 161479, where the JS protocol handler is getting
passed escaped UTF8 data and "aCharset" which says "windows-1251".  It then
unescapes the UTF8, converts it using the windows-1251-to-unicode decoder, and
ends up with crap.

Some possible remedies:

A) The anchor-triggering code in nsGenericHTMLElement should not be using
   GetHrefCString, since TriggerLink will do the right charset conversion and
   base url resolution. It should just use the attr from the element.
B) GetHrefCString can be replaced with GetHrefURI (since it creates an nsIURI
   anyway).  TriggerLink can then take the nsIURI instead of a base URI and an
   href string.

I kinda like B myself....
GetHrefCString was originally written as part of performance optimization of
visited-link testing.  If you slow it down, btek will probably "notice".
Right.  I don't think either of the proposed changes would slow it down.  They
may in fact speed things up.
Once this is fixed, revisit the code in bug 161479
Blocks: 161479
OK... well, I can't fix offhand because consumers rely on getting data that
cannot be represented as a URI out of GetHrefCString!  Of course this only works
for ASCII hrefs; non-ascii ones are screwed.  I've filed bug 176904 on the issue.
Depends on: 176904
Priority: -- → P5
Target Milestone: --- → Future
Priority: P5 → P2
Target Milestone: Future → mozilla1.5beta
Comment on attachment 125962 [details] [diff] [review]
Somewhat like this, I think....

The basic idea is to switch to nsIURI all over for this stuff... the rest is
domino effect, with the exception of the removal of an unneeded security check
or two in nsXMLElement (since TriggerLink does a security check).
Attachment #125962 - Flags: superreview?(jst)
Attachment #125962 - Flags: review?(dbaron)
Summary: TriggerLink/GetHrefCString charset weirdness → [FIX]TriggerLink/GetHrefCString charset weirdness
Target Milestone: mozilla1.5beta → mozilla1.5alpha
Comment on attachment 125962 [details] [diff] [review]
Somewhat like this, I think....

In general, this looks good. I didn't have a real close look yet, but here's
some things I caught...

- In nsILink.h:

   /**
     * Get a pointer to the UTF-8 encoded canonical URL (i.e., fully
     * resolved to the base URL) that this link element points to.  The
     * buffer returned has been allocated for and should be deleted by
     * the caller.
     *
     * @param aBuf [out] A pointer to be filled in with a pointer to a
     *		   null-terminated string containing the canonical URL.
     *		   If the element has no HREF attribute, it is set to
     *		   nsnull.
     * @return NS_OK if the out pointer is filled in
     */
-  NS_IMETHOD GetHrefUTF8(char** aBuf) = 0;
+  NS_IMETHOD GetHrefURI(nsIURI** aURI) = 0;

The comment above this method needs to be updated.

- In nsGenericElement.h:

    * Load a link, putting together the proper URL from the pieces given.
    * @param aPresContext the pres context.
    * @param aVerb how the link will be loaded (replace page, new window, etc.)
    * @param aBaseURL the base URL to start with (content.baseURL, may be null)
    * @param aURLSpec the URL of the link (may be relative)
    * @param aTargetSpec the target (like target=, may be null)
    * @param aClick whether this was a click or not (if false, it assumes you
    *	     just hovered over the link)
    */
   nsresult TriggerLink(nsIPresContext* aPresContext,
			nsLinkVerb aVerb,
			nsIURI* aBaseURL,
-			const nsAString& aURLSpec,
+			nsIURI* aLinkURI,
			const nsAFlatString& aTargetSpec,
			PRBool aClick);

If you're passing in an nsIURI, you don't need a base URI. From looking at the
impl of this method, the base URI is used to do a security check, and that's
fine, but it shouldn't be called aBaseURI.

Other than that this looks good, but I'll need to have one more look once
that's fixed before I sr...
Attachment #125962 - Flags: superreview?(jst) → superreview-
Attachment #125993 - Flags: superreview?(jst)
Attachment #125993 - Flags: review?(dbaron)
Comment on attachment 125993 [details] [diff] [review]
Fixed comments.

sr=jst
Attachment #125993 - Flags: superreview?(jst) → superreview+
Does it matter that the extra security check was
nsIScriptSecurityManager::DISALLOW_FROM_MAIL but the new one is
nsIScriptSecurityManager::STANDARD ?
I don't believe it does, since this was only called from MaybeTriggerAutoLink,
and that has a separate check that disables auto links entirely in mail (see
http://lxr.mozilla.org/seamonkey/source/content/xml/content/src/nsXMLElement.cpp#388).

That said, perhaps I should restore the DISALLOW_FROM_MAIL security check and
remove the blanket-disabling code?  Not sure which is faster/better...
Securitywise, they're probably equivalent. DISALLOW_FROM_MAIL is pretty quick,
just looks for source schemes of imap:, mailbox:, or news: and fails on those. I
recommend using DISALLOW_FROM_MAIL.
Attachment #125993 - Attachment is obsolete: false
Attachment #125993 - Flags: review?(dbaron)
Comment on attachment 126794 [details] [diff] [review]
updated to David's and Mitch's comments

>-  NS_PRECONDITION(aBaseURI != nsnull, "no base URI");

perhaps you should leave this?

>+nsWebShell::GetLinkState(nsIURI* aLinkURI, nsLinkState& aState)
> {
>   aState = eLinkState_Unvisited;
> 
>   // no history, leave state unchanged
>   if (!mGlobalHistory)
>     return NS_OK;

>+  if (!aLinkURI) {
>+    // No uri means not a link
>+    aState = eLinkState_NotLink;
>+    return NS_OK;
>+  }

Perhaps the !aLinkURI check should be before the initialization of |aState| and
the null check of |mGlobalHistory|?  The |NotLink| state (which this never
returned before -- maybe it should just be a precondition instead?) shouldn't
depend on whether global history is present.


>@@ -169,28 +169,26 @@ nsXMLElement::GetXMLBaseURI(nsIURI **aUR
>     rv = content->GetAttr(kNameSpaceID_XML,nsHTMLAtoms::base,value);
>     PRInt32 value_len;
>     if (rv == NS_CONTENT_ATTR_HAS_VALUE) {
>       PRInt32 colon = value.FindChar(':');
>       PRInt32 slash = value.FindChar('/');
>       if (colon > 0 && !( slash >= 0 && slash < colon)) {
>         // Yay, we have absolute path!
>         // The complex looking if above is to make sure that we do not erroneously
>         // think a value of "./this:that" would have a scheme of "./that"
> 
>-        NS_ConvertUCS2toUTF8 str(value);
>-
>-        rv = MakeURI(str, nsnull, aURI);
>+        rv = MakeURI(value, nsnull, mDocument, aURI);
>         if (NS_FAILED(rv))
>           break;
> 
>         if (!base.IsEmpty()) { // XXXdarin base is always empty
>-          CopyUTF16toUTF8(base, str);
>+          NS_ConvertUTF16toUTF8 str(base);

There's a loop here.  Are you sure this change is right?  And is the XXXdarin
comment correct?

(Not done reading patch.)
> There's a loop here.  Are you sure this change is right?  And is the XXXdarin
> comment correct?

Oh, never mind -- I was getting source and dest mixed up.  (But what about the
XXXdarin comment?)
Comment on attachment 126794 [details] [diff] [review]
updated to David's and Mitch's comments

>         // convert here, rather than twice in NS_MakeAbsoluteURI and
>         // back again

This isn't relevant anymore.
> perhaps you should leave this?

Sure.  Done (but I still took out the early return there).

> Perhaps the !aLinkURI check should be before the initialization of |aState| and
> the null check of |mGlobalHistory|?

Done.

As for GetXMLBaseURI(), the patch in bug 209573 removes it entirely, so I just
made the minimal changes I could that would leave it compiling... (it has a
bunch of issues other than the one Darin's comment points to in any case).
> This isn't relevant anymore.

Indeed.  Removed (artifact of de-merging the patch...)
Comment on attachment 126794 [details] [diff] [review]
updated to David's and Mitch's comments

r=dbaron if you address my comments above, and also rename
NS_MakeAbsoluteURIWithCharset to NS_NewURIWithCharset (or WithDocumentCharset
WithDocument?), since it's now analogous to NS_NewURI rather than
NS_MakeAbsoluteURI.
Attachment #126794 - Flags: review?(dbaron) → review+
Also, NS_MakeAbsoluteURIWithCharset and MakeURI seem to be doing pretty much the
same thing.  Do we need the latter?

Comment 21

15 years ago
>   // URI can't be encoded in UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32-LE,
>   // UTF-32LE, UTF-32BE (yet?). Truncate it and leave it to default (UTF-8)
>   if (originCharset[0] == 'U' &&
>       originCharset[1] == 'T' &&
>       originCharset[2] == 'F')
>     originCharset.Truncate();

I think this code should be moved into nsStandardURL::Init so all callers of
NewURI will get this.  I can't imagine any server wanting URL-escaped UTF-32,
so we should just do this mapping for all URLs.
Created attachment 126797 [details] [diff] [review]
updated to comments 19--21.
Attachment #125993 - Attachment is obsolete: true
Attachment #126794 - Attachment is obsolete: true
Comment on attachment 126797 [details] [diff] [review]
updated to comments 19--21.

>+    // URI can't be encoded in UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32-LE,
>+    // UTF-32LE, UTF-32BE (yet?). Truncate it and leave it to default (UTF-8)
>+    if (mOriginCharset.Length() >= 3 &&
>+        mOriginCharset[0] == 'U' &&
>+        mOriginCharset[1] == 'T' &&
>+        mOriginCharset[2] == 'F')
>         mOriginCharset.Truncate();

This should be

>+        (mOriginCharset[0] == 'U' || mOriginCharset[0] == 'u') &&
>+        (mOriginCharset[1] == 'T' || mOriginCharset[1] == 't') &&
>+        (mOriginCharset[2] == 'F' || mOriginCharset[2] == 'f'))

since:
 * the URL code doesn't require the canonicalization of the charset alias code
 * we don't want to introduce a new canonicalization by uppercasing /
lowercasing the input

Also, the comment should reflect what the comment there originally said, which
is that we're storing UTF-8 as the empty string since they're equivalent so we
don't need to keep it around.
Attachment #126797 - Flags: review+
Created attachment 126798 [details] [diff] [review]
The patch I just checked in

This fixes the case issue in nsStandardURL
Attachment #126797 - Attachment is obsolete: true
Checked in, Tp looks ok, marking fixed.  I looked over the code in bug 161479
again, and we may still need it for some bizarre edge cases.... :(
Status: NEW → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.