Closed Bug 274386 Opened 20 years ago Closed 20 years ago

RSS including HTML entities in <description>

Categories

(addons.mozilla.org Graveyard :: Public Pages, defect)

All
Windows XP
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: Bugzilla-alanjstrBugs)

References

Details

Attachments

(1 file, 4 obsolete files)

The "Newest Firefox Extensions" RSS feed currently includes an <item> for
http://update.mozilla.org/extensions/moreinfo.php?id=420&vid=1382 which has a
fairly borked description even in the HTML (slashes escaping single quotes), but
in the RSS it's pointing out that the generating code is apparently doing a
no-no, including HTML entities.

That's probably the number one way of making an RSS feed not-well-formed XML,
and none of the ways around it (including the RSS 0.91 DTD, or a DTD fragment
that defines them, mostly) work at all well: far far better to pick a charset
(UTF-8 unless you absolutely have to have something different) and then convert
any entities coming out of your db to actual characters, rather than what looks
to be happening, converting UTF-8 characters into two broken entities based on
the ISO-8859-1 characters that the UTF-8 bytes sort-of resemble (a » converted
to &Acirc;&raquo;).

(For pedantic purposes, if you are using encoding="utf-8", you should also be
sending the mime-type application/xml, or at the very least include a charset in
the content-type header, because text/xml with no charset *is* US-ASCII, no
matter what encoding you put in the XML Declaration. Very few consumers actually
follow that requirement, but it's easy enough to avoid the problem entirely.)
Attached patch Proper XML mimetype (obsolete) — Splinter Review
The backslashes you see are there because that's what was given to me, so
that's what I pasted.  But yes, it should be application/xml.  I've re-saved
the DB entry.
Assignee: psychoticwolf → Bugzilla-alanjstrBugs
Status: NEW → ASSIGNED
Attachment #168605 - Flags: first-review?(psychoticwolf)
Hmmm.  Went back to bug 273206 comment 4.  There were no backslashes there. 
Since I re-saved the data, it isn't appearing with backslashes any more.
http://update-beta.mozilla.org/rss/?application=firefox&type=E&list=newest
displays this:
XML Parsing Error: undefined entity
Location: http://update-beta.mozilla.org/rss/?application=firefox&type=E&list=newest
Line Number 31, Column 36:- Menu Items under 'Go' and 'View--&Acirc;&raquo;Sidebar'

That's in the description of the "Enhanced History Manager 0.4.1.02". It
complains about &Acirc;. I don't think that belongs there (it's the letter A
with a circumflex), and should be removed.
Ok, changed the text to "-->"
Thanks, that fixed it.
Comment on attachment 168605 [details] [diff] [review]
Proper XML mimetype

Patch looks fine. Though does it actually decode the html entities (if needed)
and ensure DB data will always yield a proper RSS feed?
Attachment #168605 - Flags: first-review?(psychoticwolf) → first-review+
(In reply to comment #6)
> Though does it actually decode the html entities (if needed)
> and ensure DB data will always yield a proper RSS feed?

Nope, now that I finally figured out where the source is (I'm slow, I am), what
this bug needs is a s/htmlentities/htmlspecialchars/g on inc_rssfeed.php. You
don't want to entify everything, only & and < (you'll get > too, but with
ENT_NOQUOTES since you aren't sticking anything in an attribute, you can at
least avoid the overhead of replacing '/"). Personally, I'd leave it out
entirely on things like $sitelanguage, and require callers to behave themselves,
since these feeds are going to beat the servers to a bloody pulp anyway.
Language is required to be an ISO639/ISO3166 code, which means it's US-ASCII:
any caller that doesn't make it so deserves to explode.
I'm not sure if I was supposed to get rid of the UTF stuff.  I don't need to
make the stuff inside entity-safe?
Attachment #168605 - Attachment is obsolete: true
Attachment #169340 - Flags: first-review?(psychoticwolf)
Comment on attachment 169340 [details] [diff] [review]
Wrap lots of stuff in htmlspecialchars

There's no need to htmlspecialchars() the same variable more than once.
$description is an example. If you've ran the function on the variable when it
was assigned you don't need to do it again when it's echoed.
This patch is also a mess to read.

@Phil: Your thoughts on this patch, with regard to it accomplishing its goal?
Attachment #169340 - Flags: first-review?(psychoticwolf) → first-review-
Bumping target milestone, though this change can be applied to the site when its
complete. :-)
Target Milestone: 1.0 → 1.1
Version: unspecified → 0.9
>+echo "  <title>" . htmlspecialchars($sitetitle) . "::" .
htmlspecialchars($list) . " " . $listType . "</title>\n";

Function calls are cheap, but they aren't free. $list, yes, because you
shouldn't break on bad user input, but all of the $sitefoo vars were set by a
caller who knew he was creating XML: if he puts in "<stuff&junk>", he should
break, rather than making millions of extra function calls every day.

>+echo "  <description>" . htmlspecialchars($description) . "</description>\n";

$sitedescription, no?

>+    $title = htmlspecialchars($row["Title"]);
>+    $description = htmlspecialchars($row["Description"]);

htmlspecialchars($row["Title"],ENT_NOQUOTES,UTF-8) for both would be better: you
aren't putting it in an attribute, so there's no point in escaping quotes, and
no sense risking having a multi-byte UTF-8 character where a byte in the middle
looks like a < to the default ISO-8859-1 charset. Get those two, right there
only, plus the content-type header that you still also need, as
header("Content-type: application/xml; charset=utf-8");, and your only risk of
being not-well-formed ought to be if someone can put a non-UTF-8 character into
the db.
Yeah, i got carried away.  patch later.
*** Bug 275775 has been marked as a duplicate of this bug. ***
Attachment #169340 - Attachment is obsolete: true
Attachment #169463 - Flags: first-review?(psychoticwolf)
Attachment #168605 - Flags: first-review+
Attached patch nits fixed (obsolete) — Splinter Review
Attachment #169463 - Attachment is obsolete: true
Attachment #169464 - Flags: first-review?(psychoticwolf)
Attached patch missed a bitSplinter Review
Attachment #169464 - Attachment is obsolete: true
Attachment #169467 - Flags: first-review?(psychoticwolf)
Attachment #169467 - Flags: first-review?(psychoticwolf) → first-review?(cst)
Attachment #169467 - Flags: first-review?(cst) → first-review?(cbeard)
Attachment #169464 - Flags: first-review?(psychoticwolf)
Attachment #169463 - Flags: first-review?(psychoticwolf)
Attachment #169467 - Flags: first-review?(cbeard) → first-review?(mconnor)
Comment on attachment 169467 [details] [diff] [review]
missed a bit

>Index: rss/inc_rssfeed.php
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/update/rss/inc_rssfeed.php,v
>retrieving revision 1.1
>diff --unified=4 -d -p -u -p -r1.1 inc_rssfeed.php
>--- rss/inc_rssfeed.php	24 Aug 2004 19:39:49 -0000	1.1
>+++ rss/inc_rssfeed.php	23 Dec 2004 21:21:01 -0000
>@@ -1,95 +1,96 @@
>-<?php
>-// ***** BEGIN LICENSE BLOCK *****
>-// Version: MPL 1.1/GPL 2.0/LGPL 2.1
>-//
>-// The contents of this file are subject to the Mozilla Public License Version
>-// 1.1 (the "License"); you may not use this file except in compliance with
>-// the License. You may obtain a copy of the License at
>-// http://www.mozilla.org/MPL/
>-//
>-// Software distributed under the License is distributed on an "AS IS" basis,
>-// WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
>-// for the specific language governing rights and limitations under the
>-// License.
>-//
>-// The Original Code is Mozilla Update.
>-//
>-// The Initial Developer of the Original Code is
>-// Chris "Wolf" Crews.
>-// Portions created by the Initial Developer are Copyright (C) 2004
>-// the Initial Developer. All Rights Reserved.
>-//
>-// Contributor(s):
>-//   Chris "Wolf" Crews <psychoticwolf@carolina.rr.com>
>-//   Alan Starr <alanjstarr@yahoo.com>
>-//
>-// Alternatively, the contents of this file may be used under the terms of
>-// either the GNU General Public License Version 2 or later (the "GPL"), or
>-// the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
>-// in which case the provisions of the GPL or the LGPL are applicable instead
>-// of those above. If you wish to allow use of your version of this file only
>-// under the terms of either the GPL or the LGPL, and not to allow others to
>-// use your version of this file under the terms of the MPL, indicate your
>-// decision by deleting the provisions above and replace them with the notice
>-// and other provisions required by the GPL or the LGPL. If you do not delete
>-// the provisions above, a recipient may use your version of this file under
>-// the terms of any one of the MPL, the GPL or the LGPL.
>-//
>-// ***** END LICENSE BLOCK *****
>-?>
>-<?php
>-//  http://blogs.law.harvard.edu/tech/rss
>-
>-switch ($type) {
>-  case "E":
>-    $listType = "Extensions";
>-    break;
>-  case "P":
>-    $listType = "Plugins";
>-    break;
>-  case "T":
>-    $listType = "Themes";
>-    break;
>-}
>-
>-echo "<rss version=\"2.0\">\n";
>-echo "<channel>\n";
>-
>-echo "  <title>" . htmlentities($sitetitle) . "::" . htmlentities($list) . " " . $listType . "</title>\n";
>-echo "  <link>" . htmlentities($siteurl) . "</link>\n";
>-echo "  <description>" . htmlentities($description) . "</description>\n";
>-echo "  <language>" . htmlentities($sitelanguage) . "</language>\n";
>-echo "  <copyright>" . htmlentities($sitecopyright) . "</copyright>\n";
>-echo "  <lastBuildDate>" . $currenttime . "</lastBuildDate>\n";
>-echo "  <ttl>" . $rssttl . "</ttl>\n";
>-echo "  <image>\n";
>-echo "    <title>" . htmlentities($sitetitle) . "</title>\n";
>-echo "    <link>" . htmlentities($siteurl) . "</link>\n";
>-echo "    <url>" . htmlentities($siteicon) . "</url>\n";
>-echo "    <width>16</width>\n";
>-echo "    <height>16</height>\n";
>-echo "  </image>\n";
>-
>- $sql_result = mysql_query($sql, $connection) or trigger_error("MySQL Error ".mysql_errno().": ".mysql_error()."", E_USER_NOTICE);
>-  while ($row = mysql_fetch_array($sql_result)) {
>-    $id = $row["ID"];
>-    $title = htmlentities($row["Title"]);
>-    $description = htmlentities($row["Description"]);
>-    $dateupdated = gmdate("r", strtotime($row["DateStamp"]));
>-    $version = $row["Version"];
>-    $vid = $row["vID"];
>-    $appname = $row["AppName"];
>-
>-    echo "    <item>\n";
>-    echo "      <pubDate>" . $dateupdated . "</pubDate>\n";
>-    echo "      <title>" . $title . " " . $version . " for " . $appname . "</title>\n";
>-    echo "      <link>http://$sitehostname/" . strtolower($listType) . "/moreinfo.php?id=" . $id . "&amp;vid=" . $vid . "</link>\n";
>-    echo "      <description>" . $description . "</description>\n";
>-    echo "    </item>\n";
>-
>-  }
>-
>-echo "</channel>\n";
>-echo "</rss>\n";
>-
>-?>
>+<?php
>+// ***** BEGIN LICENSE BLOCK *****
>+// Version: MPL 1.1/GPL 2.0/LGPL 2.1
>+//
>+// The contents of this file are subject to the Mozilla Public License Version
>+// 1.1 (the "License"); you may not use this file except in compliance with
>+// the License. You may obtain a copy of the License at
>+// http://www.mozilla.org/MPL/
>+//
>+// Software distributed under the License is distributed on an "AS IS" basis,
>+// WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
>+// for the specific language governing rights and limitations under the
>+// License.
>+//
>+// The Original Code is Mozilla Update.
>+//
>+// The Initial Developer of the Original Code is
>+// Chris "Wolf" Crews.
>+// Portions created by the Initial Developer are Copyright (C) 2004
>+// the Initial Developer. All Rights Reserved.
>+//
>+// Contributor(s):
>+//   Chris "Wolf" Crews <psychoticwolf@carolina.rr.com>
>+//   Alan Starr <alanjstarr@yahoo.com>
>+//
>+// Alternatively, the contents of this file may be used under the terms of
>+// either the GNU General Public License Version 2 or later (the "GPL"), or
>+// the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
>+// in which case the provisions of the GPL or the LGPL are applicable instead
>+// of those above. If you wish to allow use of your version of this file only
>+// under the terms of either the GPL or the LGPL, and not to allow others to
>+// use your version of this file under the terms of the MPL, indicate your
>+// decision by deleting the provisions above and replace them with the notice
>+// and other provisions required by the GPL or the LGPL. If you do not delete
>+// the provisions above, a recipient may use your version of this file under
>+// the terms of any one of the MPL, the GPL or the LGPL.
>+//
>+// ***** END LICENSE BLOCK *****
>+?>
>+<?php
>+//  http://blogs.law.harvard.edu/tech/rss
>+
>+switch ($type) {
>+  case "E":
>+    $listType = "Extensions";
>+    break;
>+  case "P":
>+    $listType = "Plugins";
>+    break;
>+  case "T":
>+    $listType = "Themes";
>+    break;
>+}
>+
>+echo "<rss version=\"2.0\">\n";
>+echo "<channel>\n";
>+
>+echo "  <title>" . htmlspecialchars($sitetitle,ENT_NOQUOTES,UTF-8) . "::" . htmlspecialchars($list,ENT_NOQUOTES,UTF-8) . " " . $listType . "</title>\n";
>+echo "  <link>" . $siteurl . "</link>\n";
>+echo "  <description>" . htmlspecialchars($sitedescription,ENT_NOQUOTES,UTF-8) . 
>+"</description>\n";
>+echo "  <language>" . $sitelanguage . "</language>\n";
>+echo "  <copyright>" . htmlspecialchars($sitecopyright,ENT_NOQUOTES,UTF-8) . "</copyright>\n";
>+echo "  <lastBuildDate>" . $currenttime . "</lastBuildDate>\n";
>+echo "  <ttl>" . $rssttl . "</ttl>\n";
>+echo "  <image>\n";
>+echo "    <title>" . htmlspecialchars($sitetitle,ENT_NOQUOTES,UTF-8) . "</title>\n";
>+echo "    <link>" . $siteurl . "</link>\n";
>+echo "    <url>" . $siteicon . "</url>\n";
>+echo "    <width>16</width>\n";
>+echo "    <height>16</height>\n";
>+echo "  </image>\n";
>+
>+ $sql_result = mysql_query($sql, $connection) or trigger_error("MySQL Error ".mysql_errno().": ".mysql_error()."", E_USER_NOTICE);
>+  while ($row = mysql_fetch_array($sql_result)) {
>+    $id = $row["ID"];
>+    $title = htmlspecialchars($row["Title"],ENT_NOQUOTES,UTF-8);
>+    $description = htmlspecialchars($row["Description"],ENT_NOQUOTES,UTF-8);
>+    $dateupdated = gmdate("r", strtotime($row["DateStamp"]));
>+    $version = htmlspecialchars($row["Version"],ENT_NOQUOTES,UTF-8);
>+    $vid = $row["vID"];
>+    $appname = htmlspecialchars($row["AppName"],ENT_NOQUOTES,UTF-8);
>+
>+    echo "    <item>\n";
>+    echo "      <pubDate>" . $dateupdated . "</pubDate>\n";
>+    echo "      <title>" . $title . " " . $version . " for " . $appname . "</title>\n";
>+    echo "      <link>http://$sitehostname/" . strtolower($listType) . "/moreinfo.php?id=" . $id . "&amp;vid=" . $vid . "</link>\n";
>+    echo "      <description>" . $description . "</description>\n";
>+    echo "    </item>\n";
>+
>+  }
>+
>+echo "</channel>\n";
>+echo "</rss>\n";
>+
>+?>
>Index: rss/index.php
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/update/rss/index.php,v
>retrieving revision 1.3
>diff --unified=4 -d -p -u -p -r1.3 index.php
>--- rss/index.php	9 Dec 2004 06:34:53 -0000	1.3
>+++ rss/index.php	23 Dec 2004 21:21:01 -0000
>@@ -45,17 +45,16 @@ $type = escape_string($_GET["type"]); //
> $list = ucwords(strtolower($_GET["list"])); // Newest, Updated, [Editors], Popular
> 
> $sitetitle = "Mozilla Update";
> $siteicon = "http://www.mozilla.org/images/mozilla-16.png";
>-$siteurl = "http://update.mozilla.org";
>+$siteurl = $_SERVER["SERVER_NAME"];
> $sitedescription = "the way to keep your mozilla software up-to-date";
>-$sitelanguage = "en-us";
>-$sitecopyright = "Creative Commons?";
>+$sitelanguage = "en-US";
>+$sitecopyright = "Copyright 2004-2005 The Mozilla Organization";
> $currenttime = gmdate(r);// GMT 
> $rssttl = "120"; //Life of feed in minutes
> 
>-//header("Content-Type: application/octet-stream");
>-header("Content-Type: text/xml");
>+header("Content-Type: application/xml; charset=utf-8");
> 
> // Firefox, extensions, by date added
> 
> $select = "SELECT DISTINCT 
>@@ -103,5 +102,5 @@ $sql = $select . " " . $from . " WHERE "
> 
> include"inc_rssfeed.php";
> 
> 
>-?>
>\ No newline at end of file
>+?>
Attachment #169467 - Flags: first-review?(mconnor) → first-review+
Checked into branch and trunk by mconnor at 2005-01-16 13:37
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
<b>Warning</b>
:  htmlspecialchars() [
<a href="http://www.php.net/function.htmlspecialchars">function.htmlspecialchars</a>
]: charset `-8' not supported, assuming iso-8859-1 in 
-
	<b>
D:\cvs-1.11.5\mozilla\webtools\update\rss\inc_rssfeed.php
</b>
 on line 
<b>58</b>
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Wups, sorry, UTF-8 is a string there, not a constant like ENT_NOQUOTES. 

htmlspecialchars($sitetitle,ENT_NOQUOTES,"UTF-8") etc.
Ok, mconnor has the fix on his tree.  Hopefully this will land in cvs soon.  Not
going to bother to open a new bug.
Status: REOPENED → ASSIGNED
Fix checked into trunk and branch by mconnor.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: