Closed Bug 339543 Opened 16 years ago Closed 16 years ago

microsummaries trigger bookmarked files to be downloaded

Categories

(Firefox Graveyard :: Microsummaries, defect, P1)

2.0 Branch
defect

Tracking

(Not tracked)

RESOLVED FIXED
Firefox 2 beta1

People

(Reporter: steventheconqueror, Assigned: myk)

References

()

Details

(Keywords: fixed1.8.1, regression, Whiteboard: 181b1+)

Attachments

(4 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1a2) Gecko/20060526 BonEcho/2.0a2
Build Identifier: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1a2) Gecko/20060526 BonEcho/2.0a2

Right click's "Bookmark This Link" option bookmarks a link _and_ attempts to save the link. This only seems to happen when doing this on binary files, not regular web page links. Not sure when this regressed since my last build was 20060503.

Reproducible: Always

Steps to Reproduce:
1. Go to test url
2. Right click a music file
3. Choose "Bookmark This Link"

Actual Results:  
An add bookmark dialog and a save as dialog pops up.

Expected Results:  
Only the add bookmark dialog should pop up.
Keywords: regression
OS: Linux → All
Version: unspecified → 2.0 Branch
Component: Menus → Bookmarks
QA Contact: menus → bookmarks
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking-firefox2?
This is due to microsummaries. It tries to download a page that you're bookmarking to see if it offers any microsummaries. It uses a hidden iframe to do this, but that doesn't handle content-disposition: attachment or externally handled file types very well.

http://lxr.mozilla.org/seamonkey/source/browser/components/microsummaries/src/nsMicrosummaryService.js.in#1491
Assignee: nobody → myk
Status: NEW → ASSIGNED
Is that what's going on in bug 339378 too?

Loading untrusted content in a hidden iframe from chrome sounds like a bad idea to me...
Summary: "Bookmark This Link" on a file downloads the file → microsummaries trigger bookmarked files to be downloaded
Flags: blocking-firefox2? → blocking-firefox2+
Target Milestone: --- → Firefox 2 beta1
Whiteboard: [swag: 2d]
Per discussions with bz, it looks like the best way to solve this problem, on the branch anyway, is to download the content via XMLHttpRequest, then feed it to the iframe via a data: URL if the content is HTML.

> Loading untrusted content in a hidden iframe from chrome sounds like a bad idea
> to me...

They load documents with JavaScript disabled (docShell.allowJavascript = false).  But for added protection, they should also specify type="content".
Whiteboard: [swag: 2d] → [swag: 4d]
Whiteboard: [swag: 4d] → [swag: 6d]
This patch makes the microsummary service load HTML pages via data: URLs constructed from the responseText of XMLHttpRequest objects.  This way we can ensure the content is HTML before trying to load it in a hidden iframe; we thus avoid triggering a download dialog for non-HTML content.

In the process of fixing this bug I also cleaned up the mechanism by which the microsummary service loads both HTML and XML content.  It now uses a single MicrosummaryResource object to load both types, and the parts of the service that need to load content all do so in the same way.

For added security, I also turned off the loading of plugin and sub-iframe content and added type="content" to the hidden iframes.  And for better UI performance I set the LOAD_BACKGROUND flag on the channel.

Asking review from biesi, whom bz says is the right person to look at this.
Attachment #225351 - Flags: review?
Attachment #225351 - Flags: review? → review?(cbiesinger)
Blocks: 341099
This patch fixes a regression the first patch introduced that caused it to fail to generate microsummaries via remote generators.
Attachment #225351 - Attachment is obsolete: true
Attachment #225405 - Flags: review?(cbiesinger)
Attachment #225351 - Flags: review?(cbiesinger)
Here's a rough summary and diagram explaining what the code in this patch is doing.  The purpose of the patch is twofold:

1. to abstract out the loading and parsing of XML, HTML, and plaintext content,
   since various parts of the microsummaries code need to load these types
   of content, and sometimes they don't know in advance what type of content
   they'll receive.

2. to load and parse HTML content without any indication of the load in the UI,
   since these loads aren't directly requested by the user, and notifying
   the user about them would be confusing.

We handle these tasks with a new MicrosummaryResource object.  A caller which wants to load some content creates an instance of MicrosummaryResource, passing the content's URI to the constructor.  Then it calls MicrosummaryResource.load(), passing it a callback function.  MicrosummaryResource.load() then asynchronously loads (and parses, for XML and HTML) the content, calling the callback when it's done.

It's relatively easy for MicrosummaryResource to load and parse XML.  It just loads the URI via XMLHttpRequest, and XMLHttpRequest downloads the content and automatically parses it into an XMLDocument object.

It's even easier for MicrosummaryResource to load plaintext, since there's no parsing involved.  Again, it uses XMLHttpRequest, which downloads the raw content and then makes it available as a string.

But loading and parsing HTML is harder, because XMLHttpRequest doesn't parse it, and there aren't good JavaScript-accessible APIs to the HTML parser.  So MicrosummaryResource downloads the raw content via XMLHttpRequest, stuffs it into a data: URL, creates a hidden iframe, and loads the data: URL in the iframe, causing the iframe to parse it into an HTMLDocument object.

Once the callback function is done doing whatever it needs to do with the parsed content, it calls MicrosummaryResource.destroy() to free up any memory that might otherwise be locked up in reference cycles by the MicrosummaryResource instance.
Comment on attachment 225405 [details] [diff] [review]
patch v2: fixes a regression in the first patch

Thanks a lot for the diagram!

Now to the patch...

  _handleLocalGenerator: function(resource) {

It seems like the rest of the file tries to avoid anonymous functions, so I guess this one should get a name too.

  _handleNewGenerator: function(resource) {

same here.


    serializer.serializeToStream(rootNode, fos, null);

Out of curiousity, why does this serialize the root node and not the document?

    // XXX Make sure the argument is a DOM node

That comment seems a bit wrong now that the argument is a document.

    var rootNode = xmlDocument.getElementsByTagNameNS(MICSUM_NS, "generator")[0];

This variable name is kind of misleading given that the node is not necessarily the root node.

    // XXX We should assert if the file doesn't contain a generator.

No you shouldn't, assertions are for programming errors, not for malformed input.

    if (!resource.isXML && !resource.contentType == "text/html")

Hm, wouldn't using != be more usual?

      var generatorURI = this._ios.newURI(link.href, null, null);

Shouldn't relative URIs be supported? Also, I'd pass resource.content.characterSet as the second argument here.

  // XXX Perhaps use nsIScriptSecurityManager.checkLoadURI instead.

That would be better IMO, because then you can also avoid the hardcoded list here and support other schemes.

But it should probably be checkLoadURIWithPrincipal (check with bz); and, you need the requesting uri/document/principal here.

  _isXML: null,

this is a boolean variable. Why not init to false instead?

                   this.contentType.search(/^.+\/.+\+xml$/) != -1);

    /^.+\/.+\+xml$/.test(this.contentType));

But I'm not sure if it's even possible to have a content type without a slash... (I've seen code that prevents that, but I don't know how uniformly that's applied)

+      // Per http://www.w3.org/TR/XMLHttpRequest/, the charset of responseText
+      // should be either the value specified by the HTTP response or "UTF-8".
+      var charset = request.channel.contentCharset || "UTF-8";
+      var dataURISpec = "data:text/html;charset=" + charset + "," + request.responseText;
+      var dataURI = this._ios.newURI(dataURISpec, null, null);

This is not right. responseText itself will always be UTF-16 because it is a DOMString (XMLHttpRequest will convert it to that when it was something else). When you pass this to newURI, it becomes UTF-8. So the charset in the data URL here should always be UTF-8.


It seems that these ifs treat text/plain and unknown types the same. Why not have text/plain fall into the else-clause as well? Maybe with a comment that that catches text/plain as well as unknown types.

    var mainWindow = document.getElementById('main-window');

Why not just put this below the root element?

 	
    //this._iframe.setAttribute('src', dataURI.spec);

I'd remove this and replace it with a comment why you use the uriloader (i.e. so that you can set LOAD_BACKGROUND)

  // Apparently the Z order enumerator is broken on Linux.

is a bug filed on this? should probably mention its number in this comment.

     var tabBrowser = win.document.getElementById("content");

What if there is no tabbrowser in the window? E.g. JS Console, or preferences, or whatever.

    for ( i = 0; i < tabBrowser.browsers.length; i++ ) {

var i?
Attachment #225405 - Flags: review?(cbiesinger) → review-
>   _handleLocalGenerator: function(resource) {
> 
> It seems like the rest of the file tries to avoid anonymous functions, so I
> guess this one should get a name too.
> 
>   _handleNewGenerator: function(resource) {
> 
> same here.

Good catch.  I've added names for them.


>     serializer.serializeToStream(rootNode, fos, null);
> 
> Out of curiousity, why does this serialize the root node and not the document?

Good question.  Just a mistake, I think.  Serializing the document prepends the standard XML processing instruction to it, which seems like a good thing, so I've changed this to serialize the whole document.


>     // XXX Make sure the argument is a DOM node
> 
> That comment seems a bit wrong now that the argument is a document.

Indeed.  I've replaced it with this comment:

    // XXX Make sure the argument is a valid generator XML document.


>     var rootNode = xmlDocument.getElementsByTagNameNS(MICSUM_NS,
> "generator")[0];
> 
> This variable name is kind of misleading given that the node is not necessarily
> the root node.

Indeed.  I'll probably require the generator node to be the root node in the future, but in the meantime I've renamed this to generatorNode.


>     // XXX We should assert if the file doesn't contain a generator.
> 
> No you shouldn't, assertions are for programming errors, not for malformed
> input.

Err, right, I meant to say we should throw.  Fixed.


>     if (!resource.isXML && !resource.contentType == "text/html")
> 
> Hm, wouldn't using != be more usual?

D'oh!  Fixed.


>       var generatorURI = this._ios.newURI(link.href, null, null);
> 
> Shouldn't relative URIs be supported?

They are supported, since the browser converts relative URLs into absolute ones automatically.  For example, given the following link tag at http://weblogs.mozillazine.org/asa/:

<link REL="icon" HREF="notblog.png" TYPE="image/png">

... link.getAttribute("href") is the relative URL "notblog.png", but link.href (i.e. the "href" property as opposed to the "href" attribute) is "http://weblogs.mozillazine.org/asa/notblog.png".


> Also, I'd pass resource.content.characterSet as the second argument here.

Good point.  I've added that.


>   // XXX Perhaps use nsIScriptSecurityManager.checkLoadURI instead.
> 
> That would be better IMO, because then you can also avoid the hardcoded list
> here and support other schemes.

Right, which is probably a good thing, although there is something to be said for the current method of whitelisting specific schemes rather than just blacklisting known dangerous schemes, which is what calling checkLoadURI[WithPrincipal] with the DISALLOW_SCRIPT_OR_DATA or DISALLOW_SCRIPT flags does.


> But it should probably be checkLoadURIWithPrincipal (check with bz); and, you
> need the requesting uri/document/principal here.

In most of these cases, there's no requesting uri/document; the microsummary service itself is doing the requesting.  The only exception is when we load a URI specified by a link tag, for which it seems like checkLoadURI is the right function based on existing usage in the codebase.

I'll check with bz to see what principal best represents the microsummary service.  In the meantime, I have added a checkLoadURI call to the extractFromPage method to check URIs in link tags.  But given that bz is out through the 20th, and since we were whitelisting before as well, perhaps I can tackle that issue in a separate bug?


>   _isXML: null,
> 
> this is a boolean variable. Why not init to false instead?

In theory initializing to null lets us differentiate between "not determined" and "known not to be XML," but in practice I'm not using it like that, so I changed this to initialize to false, which makes the code a bit cleaner (now we only have to set the value if it turns out we do have XML).


>                    this.contentType.search(/^.+\/.+\+xml$/) != -1);
> 
>     /^.+\/.+\+xml$/.test(this.contentType));
> 
> But I'm not sure if it's even possible to have a content type without a
> slash... (I've seen code that prevents that, but I don't know how uniformly
> that's applied)

Hmm, that looks like a cleaner syntax.  Ok, changed.

I also don't know if it's possible to have a content type without a slash.  RFC 3023, which defines the XML content types, says "applications may match for types that represent XML MIME entities by comparing the subtype to the pattern '*/*+xml'," so I built the regular expression equivalent of that expression, interpreting '*' to mean "one or more characters."


> +      // Per http://www.w3.org/TR/XMLHttpRequest/, the charset of responseText
> +      // should be either the value specified by the HTTP response or "UTF-8".
> +      var charset = request.channel.contentCharset || "UTF-8";
> +      var dataURISpec = "data:text/html;charset=" + charset + "," +
> request.responseText;
> +      var dataURI = this._ios.newURI(dataURISpec, null, null);
> 
> This is not right. responseText itself will always be UTF-16 because it is a
> DOMString (XMLHttpRequest will convert it to that when it was something else).
> When you pass this to newURI, it becomes UTF-8. So the charset in the data URL
> here should always be UTF-8.

Ok, fixed.  Thanks for the explanation.  I added it as a comment on the code.


> It seems that these ifs treat text/plain and unknown types the same. Why not
> have text/plain fall into the else-clause as well? Maybe with a comment that
> that catches text/plain as well as unknown types.

Good point.  Ok, done.


>     var mainWindow = document.getElementById('main-window');
> 
> Why not just put this below the root element?

main-window is the root element.  Changed this to the following to make it clearer:

    var rootElement = document.documentElement;


>     //this._iframe.setAttribute('src', dataURI.spec);
> 
> I'd remove this and replace it with a comment why you use the uriloader (i.e.
> so that you can set LOAD_BACKGROUND)

Good idea.  Ok, done.


>   // Apparently the Z order enumerator is broken on Linux.
> 
> is a bug filed on this? should probably mention its number in this comment.

Yeah, it's bug 156333.  I have added the number to the comment.


>      var tabBrowser = win.document.getElementById("content");
> 
> What if there is no tabbrowser in the window? E.g. JS Console, or preferences,
> or whatever.

I only get windows of type "navigator:browser", which all have tabbrowsers.


>     for ( i = 0; i < tabBrowser.browsers.length; i++ ) {
> 
> var i?

Right, fixed.

This patch also:

* disables authentication dialogs via docShell.allowAuth to further mask the 
  download;
* changes the name of the "fos" variable to "outputStream" to make its function 
  clearer.
Attachment #225405 - Attachment is obsolete: true
Attachment #225855 - Flags: review?(cbiesinger)
(In reply to comment #8)
> They are supported, since the browser converts relative URLs into absolute ones
> automatically.

Oh, right. Thanks for reminding me.

> Right, which is probably a good thing, although there is something to be said
> for the current method of whitelisting specific schemes rather than just
> blacklisting known dangerous schemes, which is what calling
> checkLoadURI[WithPrincipal] with the DISALLOW_SCRIPT_OR_DATA or DISALLOW_SCRIPT
> flags does.

OK, it's just that I don't like hardcoded scheme lists; the security manager list should ideally not be a list but something to be decided by individual protocol handlers. that way, it would be extendable by extensions. anyway... this is OK for now.

> In most of these cases, there's no requesting uri/document; the microsummary
> service itself is doing the requesting.  The only exception is when we load a
> URI specified by a link tag, for which it seems like checkLoadURI is the right
> function based on existing usage in the codebase.

Oh, hm. That's a good point. You could maybe use the URI of the bookmark as requesting URI...

> But given that bz is out
> through the 20th, and since we were whitelisting before as well, perhaps I can
> tackle that issue in a separate bug?

That's fine.

> I also don't know if it's possible to have a content type without a slash.  RFC
> 3023, which defines the XML content types, says "applications may match for
> types that represent XML MIME entities by comparing the subtype to the pattern
> '*/*+xml'," so I built the regular expression equivalent of that expression,
> interpreting '*' to mean "one or more characters."

Ah, ok. Maybe mentioning in a comment where the regex came from would be a good idea :-)

> I only get windows of type "navigator:browser", which all have tabbrowsers.

Ah, I missed that, thanks.

Comment on attachment 225855 [details] [diff] [review]
patch v3: addresses review comments

r=biesi

note that I'm not actually a peer for this module, so I'm not sure if that review suffices.
Attachment #225855 - Flags: review?(cbiesinger) → review+
Comment on attachment 225855 [details] [diff] [review]
patch v3: addresses review comments

Thanks Christian; requesting super-review from mconnor in case your review doesn't suffice.
Attachment #225855 - Flags: superreview?(mconnor)
Priority: -- → P1
Hardware: PC → All
Whiteboard: [swag: 6d] → [swag: 0.50d]
Blocks: 341340
Blocks: 342508
Blocks: 341840
Blocks: 342591
Patch v3 is slightly bit-rotted on the trunk due to the checkin for bug 342226.  Here's a version of patch v3 updated to work on the trunk.
Attachment #227049 - Flags: review?(mconnor)
In case it makes the patch easier to review, here's the exact same patch as v3 for the trunk but with some added spaces as hints to nudge the patch tool into creating a diff that correctly shows which new sections of the v3 patch are replacing which old sections.
Blocks: 342908
Whiteboard: [swag: 0.50d] → [swag: 0.50d] 181b1+
Attachment #225855 - Flags: superreview?(mconnor) → superreview+
Attachment #227049 - Flags: review?(mconnor) → review+
Fix checked in to the trunk.  I'll let it bake there for a day before requesting branch approval.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Whiteboard: [swag: 0.50d] 181b1+ → 181b1+
Comment on attachment 227049 [details] [diff] [review]
patch v3 for the trunk

Requesting 1.8 branch approval.  This patch has baked on the trunk for a day without any new regression bugs having been filed on it.

This patch does present a moderate risk of regressions, since it makes significant changes to the way the microsummary service loads network resources.  But it also blocks several other important bugs from landing, so it would be useful to get it on the branch sooner than later.

I'll keep my eye out for regressions after it lands, even if that means watching incoming bugs during the upcoming four day weekend.
Attachment #227049 - Flags: approval1.8.1?
Comment on attachment 227049 [details] [diff] [review]
patch v3 for the trunk

a=mconnor on behalf of drivers, this is a critical path bug for b1, and should land earlier rather than later
Attachment #227049 - Flags: approval1.8.1? → approval1.8.1+
Fix checked in to the 1.8 branch.
Keywords: fixed1.8.1
Blocks: 342859
Component: Bookmarks → Microsummaries
Blocks: 338895
Product: Firefox → Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.